Saturday, January 29, 2022

New version of pcsc-tools: 1.6.0

I just released a new version of pcsc-tools, a suite of tools for PC/SC.
I also forgot to announce the version 1.5.8.

The major changes are for the pcsc_scan tool.

Changes:

1.6.0 - 29 January 2022, Ludovic ROUSSEAU
  • 48 new ATRs
  • pcsc_scan:
    • drastically reduce the number of SCardGetStatusChange() calls
    • faster spinning animation
    • handle Ctrl-C on macOS

1.5.8 - 7 November 2021, Ludovic ROUSSEAU
  • 360 new ATRs
  • ATR_analysis:
    • fix TB2 parsing error
    • misc spelling fixes
  • pcsc_scan:
    • add maxtime option -t
    • add the option -c to list cards only once
    • no spinner in quiet mode (-q)
    • turn off colour if redirected output
    • Exit if no reader is found and -c or -r is used

Thursday, January 27, 2022

Accessing a lot of smart cards? (part 2)

Last year in "Accessing a lot of smart cards?" I wrote about accessing many smart cards in parallel.

One of my test platform was the sysmoOCTSIM, an 8-slots reader I presented in "sysmoOCTSIM: 8 slots reader". One advantage of this reader is that the 8 slots can be used at the same time. But my CCID driver did not support simultaneous access of different slots of the same reader.

Extract from the previous article (March 2021):

sysmoOCTSIM

My CCID driver for Unix do support multi-slot readers. But only one slot can be used at the same time. It is a limitation of the driver.

Supporting accesses to 2 or more slots in parallel would imply a change from synchronous USB communication to asynchronous USB communication. That is a possible change but not an easy one.


Results

number of slots sequential exe parallel exe
1 5.126s 5.126s
2 10.273s 10.030s
3 15.321s 14.944s


You may note that in the case of parallel execution we have a linear growth. As I explained before only one slot can be used at the same time. So pcsc-lite (the PC/SC resource manager) has to serialize the accesses to the different slots from the different executions.

The parallel execution is a bit more efficient than the sequential execution because part of the execution can be executed in parallel. But not so much. 

 

Problem fixed

My CCID driver now (since version 1.5.0) has support of simultaneous access to the slots of a reader.

But not all multi-slots readers can support simultaneous access. The reader must declare that all the slots can be used the same time. The USB descriptor field bMaxCCIDBusySlots must have a value greater than 1. Ideally this value should correspond to the number of slots. My CCID driver enables simultaneous access only if bMaxCCIDBusySlots correspond to the number of slots i.e. bMaxSlotIndex +1.


Readers that should support this feature:

Not so many readers will benefit from this improvement. They are:

Performances

So what are the performances now?

With the sysmoOCTSIM 8-slots reader I now get:

# User Sys Clock
CPU
0 0,07 0,02 24,65 0 %
1 0,19 0,04 24,72 0 %
2 0,21 0,05 24,68 1 %
3 0,26 0,08 24,59 1 %
4 0,34 0,09 24,67 1 %
5 0,41 0,10 24,65 2 %
6 0,50 0,11 24,64 2 %
7 0,56 0,13 24,72 2 %

I used the GNU time command to measure the User, System and clock times.

As expected the user (and system) time grows with the number of cards (slots) used.


Also as expected the clock time is rather constant to 24.6 seconds in all cases instead of growing linearly as it was in the case in "Accessing a lot of smart cards?".

We can clearly see the effect of the simultaneous accesses here.

 

Results with 88 slots

I got (remote) access to a sysmoSIMBANK 96 with 96 slots. See "A reader for 96 smart cards? sysmoSIMBANK" for more details about the reader.

 

Performances

# User Sys Clock CPU
0 0,23 0,08 21,00 1 %
1 0,52 0,11 24,27 2 %
2 0,77 0,23 24,30 4 %
3 1,17 0,25 24,34 5 %
4 1,51 0,28 24,41 7 %
5 1,43 0,33 24,44 7 %
6 1,81 0,37 24,50 8 %
7 2,12 0,48 24,88 10 %
8 2,53 0,51 25,00 12 %
9 2,90 0,57 25,07 13 %
10 3,15 0,75 25,06 15 %
11 3,59 0,79 25,15 17 %
12 3,94 0,85 25,24 19 %
13 4,37 0,85 25,30 20 %
14 4,84 0,92 25,33 22 %
15 5,19 0,98 25,28 24 %
16 5,56 1,10 25,41 26 %
17 5,89 1,20 25,51 27 %
18 6,41 1,25 25,55 30 %
19 6,75 1,32 25,58 31 %
20 7,14 1,41 25,58 33 %
21 7,49 1,50 25,46 35 %
22 7,88 1,58 25,75 36 %
23 8,16 1,64 25,65 38 %
24 8,75 1,70 25,80 40 %
25 9,00 1,79 25,91 41 %
26 9,35 1,88 25,88 43 %
27 9,76 1,95 25,84 45 %
28 10,26 1,99 25,51 48 %
29 10,58 2,09 25,95 48 %
30 10,99 2,16 26,26 50 %
31 11,21 2,29 25,97 51 %
32 11,56 2,42 26,23 53 %
33 12,00 2,40 26,06 55 %
34 12,48 2,43 26,38 56 %
35 13,07 2,41 26,67 58 %
36 13,23 2,69 26,23 60 %
37 13,63 2,70 26,30 62 %
38 13,90 2,88 26,04 64 %
39 14,55 2,69 26,57 64 %
40 14,85 2,83 26,43 66 %
41 15,17 2,94 25,71 70 %
42 15,47 3,05 26,48 69 %
43 15,88 3,12 26,35 72 %
44 16,32 3,29 26,27 74 %
45 16,66 3,23 26,67 74 %
46 17,29 3,25 26,69 76 %
47 17,28 3,56 26,48 78 %
48 17,88 3,50 26,69 80 %
49 18,31 3,54 26,78 81 %
50 18,74 3,59 27,16 82 %
51 18,62 3,66 26,10 85 %
52 19,01 3,60 26,74 84 %
53 19,29 3,88 26,20 88 %
54 19,29 3,85 26,95 85 %
55 19,81 3,74 26,33 89 %
56 20,10 3,94 26,66 90 %
57 20,28 4,17 26,73 91 %
58 20,80 4,05 27,09 91 %
59 21,02 4,02 26,39 94 %
60 21,04 4,23 29,14 86 %
61 21,56 4,28 29,18 88 %
62 21,55 4,23 29,22 88 %
63 21,78 4,34 29,19 89 %
64 22,01 4,65 29,36 90 %
65 22,67 4,54 29,41 92 %
66 22,74 4,78 29,60 92 %
67 23,65 4,58 29,51 95 %
68 24,07 4,53 30,83 92 %
69 24,26 4,64 30,88 93 %
70 23,88 5,07 30,95 93 %
71 24,35 4,98 31,06 94 %
72 24,93 4,89 31,18 95 %
73 25,30 4,96 31,23 96 %
74 25,51 5,26 31,14 98 %
75 25,91 5,15 31,42 98 %
76 26,10 5,47 31,54 100 %
77 26,53 5,44 31,37 101 %
78 27,06 5,51 31,76 102 %
79 27,01 5,31 31,56 102 %
80 27,56 5,31 31,68 103 %
81 27,86 5,57 31,79 105 %
82 28,17 5,59 31,76 106 %
83 28,60 5,67 31,74 107 %
84 29,03 5,64 32,18 107 %
85 29,15 5,88 31,82 110 %
86 29,67 5,96 32,35 110 %
87 30,20 5,98 32,47 111 %

 

 Here again the user and system times grow linearly.


And again the total time is rather constant. The total time is multiplied by 1.5 while the number of cards goes from 1 to 88.

 

The CPU load is also growing linearly. The system has a 4-core CPU so it is not surprising to get more than 100% of CPU usage.

My sample test is not optimized for speed or CPU load at all. I use make -j to start one Python program usim_read.py per slot. So make has to start 88 Python processes in the case of 88 slots.
The goal was to use standard and simple tools.

I stopped at 88 slots instead of the expected 96 because one of the 12 sysmoOCTSIM reader (part of the sysmoSIMBANK 96 reader) was not working correctly at the time.


Conclusion

A big thank to Sysmocom for helping my work on this code.

I am very happy to see pcsc-lite and my CCID driver able to handle 88 APDU exchanges at the same time.

New version of libccid: 1.5.0

I just released version 1.5.0 of libccid the Free Software CCID class smart card reader driver.

Changes:

1.5.0 - 27 January 2022, Ludovic Rousseau
  • Add support of
    • ACS ACR1281U
    • Circle CCR7125 ICC
    • Circle CIR125 ICC
    • Circle CIR125-DOT ICC
    • Circle CIR215 CL with iProduct 0x2100
    • Circle CIR315 DI
    • Circle CIR315 with idProduct: 0x0324
    • Circle CIR315 with idProduct: 0x7004
    • Circle CIR415 CL
    • Circle CIR515 ICC
    • Circle CIR615 CL
    • Circle CIR615 CL & 1S
    • ELYCTIS CL reader
    • Nitrokey Nitrokey 3
    • Thales Shield M4 Reader
  • Add support of simultaneous slot access on multi slots readers
  • Use FeliCa instead of Felica on SONY request
  • Fix SafeNet eToken 5110 SC issue
  • Allow vendor control commands for Omnikey 5427 CK
  • Always compute readTimeout to use a value greater than default 3 seconds
  • Check the bSeq value when receiving a CCID frame
  • Avoid logging errors when a reader is removed
  • Some other minor improvements

Friday, January 21, 2022

Multi-thread and Atomic

Multi-thread programming seams easy but it is difficult to write correct multi-threading code.

For example pcsc-lite and my CCID driver use threads and are not (yet) perfect. One problem in particular is the access to the same variable from different threads.

C11 standard defines the Atomic types to make multi-thread programming easier.

Source code

This source code exhibits the problem.

#include <pthread.h>
#include <stdio.h>

enum CONSTANTS {
    NUM_THREADS = 1000,
    NUM_ITERS = 1000
};

_Atomic int global_a = 0;
int global = 0;

static void* main_thread(void *arg)
{
    (void)arg;

    int i;
    for (i = 0; i < NUM_ITERS; ++i)
    {
        global_a++;
        global++;
    }
    return NULL;
}

int main(void)
{
    int i;
    pthread_t threads[NUM_THREADS];

    for (i = 0; i < NUM_THREADS; ++i)
        pthread_create(&threads[i], NULL, main_thread, NULL);
    for (i = 0; i < NUM_THREADS; ++i)
        pthread_join(threads[i], NULL);

    printf("global_a %d %s\n", global_a,
        global_a == NUM_THREADS * NUM_ITERS ? "OK" : "FAIL");

    printf("global   %d %s\n", global,
        global == NUM_THREADS * NUM_ITERS ? "OK" : "FAIL");

    return 0;
}

Result

If I compile and run the sample code I get:

global_a 1000000 OK
global   660409 FAIL
or, with another execution:
global_a 1000000 OK
global   691552 FAIL

You can see that the variable global that is NOT declared with _Atomic does not have the expected value. Some updates of the variable value failed (were skipped).

Another option if you do not want or can't use _Atomic is to use pthread_mutex_lock() and pthread_mutex_unlock() to protect the accesses to the variable. But the code is then harder to read.

Impact on pcsc-lite and libccid

The problem was reported by andrei-datcu in the pull request No data races in EHStatusHandlerThread.

I then fixed different problems in these changes (non-exhautive list):

And simplified the code by removing a mutex in Remove mutex and use _Atomic instead.

Conclusion

The next versions of pcsc-lite and libccid will be safer and more correct.

Monday, January 10, 2022

Happy new year 2022

Dear readers,

I wish you a happy new year for 2022.
Maybe COVID-19 will be less problematic this year. We will see.

In 2021 I published 27 articles on this blog.


Audience

The number of users is decreasing (5%) compared to 2020.


The top 10 countries are the same as in 2020. But the order changed a bit. France is now second :-) 

 

Windows is still the most used system (from 41% to 44%).
Android is growing fast from 4% to 11%.
Macintosh dropped from 31% to 22%. Come on Mac users!


Most read articles

This year again articles about sample code in different languages are popular.

The article about pcsc_scan on Windows moved from the 8th to the 6th place in the top 10. Maybe I should write more about Windows?

 

Conclusion

Thank you to you, readers.

This blog has no advertising. If you want to support me you can send me some bitcoins or become a github sponsor.