Thursday, January 27, 2022

Accessing a lot of smart cards? (part 2)

Last year in "Accessing a lot of smart cards?" I wrote about accessing many smart cards in parallel.

One of my test platform was the sysmoOCTSIM, an 8-slots reader I presented in "sysmoOCTSIM: 8 slots reader". One advantage of this reader is that the 8 slots can be used at the same time. But my CCID driver did not support simultaneous access of different slots of the same reader.

Extract from the previous article (March 2021):

sysmoOCTSIM

My CCID driver for Unix do support multi-slot readers. But only one slot can be used at the same time. It is a limitation of the driver.

Supporting accesses to 2 or more slots in parallel would imply a change from synchronous USB communication to asynchronous USB communication. That is a possible change but not an easy one.


Results

number of slots sequential exe parallel exe
1 5.126s 5.126s
2 10.273s 10.030s
3 15.321s 14.944s


You may note that in the case of parallel execution we have a linear growth. As I explained before only one slot can be used at the same time. So pcsc-lite (the PC/SC resource manager) has to serialize the accesses to the different slots from the different executions.

The parallel execution is a bit more efficient than the sequential execution because part of the execution can be executed in parallel. But not so much. 

 

Problem fixed

My CCID driver now (since version 1.5.0) has support of simultaneous access to the slots of a reader.

But not all multi-slots readers can support simultaneous access. The reader must declare that all the slots can be used the same time. The USB descriptor field bMaxCCIDBusySlots must have a value greater than 1. Ideally this value should correspond to the number of slots. My CCID driver enables simultaneous access only if bMaxCCIDBusySlots correspond to the number of slots i.e. bMaxSlotIndex +1.


Readers that should support this feature:

Not so many readers will benefit from this improvement. They are:

Performances

So what are the performances now?

With the sysmoOCTSIM 8-slots reader I now get:

# User Sys Clock
CPU
0 0,07 0,02 24,65 0 %
1 0,19 0,04 24,72 0 %
2 0,21 0,05 24,68 1 %
3 0,26 0,08 24,59 1 %
4 0,34 0,09 24,67 1 %
5 0,41 0,10 24,65 2 %
6 0,50 0,11 24,64 2 %
7 0,56 0,13 24,72 2 %

I used the GNU time command to measure the User, System and clock times.

As expected the user (and system) time grows with the number of cards (slots) used.


Also as expected the clock time is rather constant to 24.6 seconds in all cases instead of growing linearly as it was in the case in "Accessing a lot of smart cards?".

We can clearly see the effect of the simultaneous accesses here.

 

Results with 88 slots

I got (remote) access to a sysmoSIMBANK 96 with 96 slots. See "A reader for 96 smart cards? sysmoSIMBANK" for more details about the reader.

 

Performances

# User Sys Clock CPU
0 0,23 0,08 21,00 1 %
1 0,52 0,11 24,27 2 %
2 0,77 0,23 24,30 4 %
3 1,17 0,25 24,34 5 %
4 1,51 0,28 24,41 7 %
5 1,43 0,33 24,44 7 %
6 1,81 0,37 24,50 8 %
7 2,12 0,48 24,88 10 %
8 2,53 0,51 25,00 12 %
9 2,90 0,57 25,07 13 %
10 3,15 0,75 25,06 15 %
11 3,59 0,79 25,15 17 %
12 3,94 0,85 25,24 19 %
13 4,37 0,85 25,30 20 %
14 4,84 0,92 25,33 22 %
15 5,19 0,98 25,28 24 %
16 5,56 1,10 25,41 26 %
17 5,89 1,20 25,51 27 %
18 6,41 1,25 25,55 30 %
19 6,75 1,32 25,58 31 %
20 7,14 1,41 25,58 33 %
21 7,49 1,50 25,46 35 %
22 7,88 1,58 25,75 36 %
23 8,16 1,64 25,65 38 %
24 8,75 1,70 25,80 40 %
25 9,00 1,79 25,91 41 %
26 9,35 1,88 25,88 43 %
27 9,76 1,95 25,84 45 %
28 10,26 1,99 25,51 48 %
29 10,58 2,09 25,95 48 %
30 10,99 2,16 26,26 50 %
31 11,21 2,29 25,97 51 %
32 11,56 2,42 26,23 53 %
33 12,00 2,40 26,06 55 %
34 12,48 2,43 26,38 56 %
35 13,07 2,41 26,67 58 %
36 13,23 2,69 26,23 60 %
37 13,63 2,70 26,30 62 %
38 13,90 2,88 26,04 64 %
39 14,55 2,69 26,57 64 %
40 14,85 2,83 26,43 66 %
41 15,17 2,94 25,71 70 %
42 15,47 3,05 26,48 69 %
43 15,88 3,12 26,35 72 %
44 16,32 3,29 26,27 74 %
45 16,66 3,23 26,67 74 %
46 17,29 3,25 26,69 76 %
47 17,28 3,56 26,48 78 %
48 17,88 3,50 26,69 80 %
49 18,31 3,54 26,78 81 %
50 18,74 3,59 27,16 82 %
51 18,62 3,66 26,10 85 %
52 19,01 3,60 26,74 84 %
53 19,29 3,88 26,20 88 %
54 19,29 3,85 26,95 85 %
55 19,81 3,74 26,33 89 %
56 20,10 3,94 26,66 90 %
57 20,28 4,17 26,73 91 %
58 20,80 4,05 27,09 91 %
59 21,02 4,02 26,39 94 %
60 21,04 4,23 29,14 86 %
61 21,56 4,28 29,18 88 %
62 21,55 4,23 29,22 88 %
63 21,78 4,34 29,19 89 %
64 22,01 4,65 29,36 90 %
65 22,67 4,54 29,41 92 %
66 22,74 4,78 29,60 92 %
67 23,65 4,58 29,51 95 %
68 24,07 4,53 30,83 92 %
69 24,26 4,64 30,88 93 %
70 23,88 5,07 30,95 93 %
71 24,35 4,98 31,06 94 %
72 24,93 4,89 31,18 95 %
73 25,30 4,96 31,23 96 %
74 25,51 5,26 31,14 98 %
75 25,91 5,15 31,42 98 %
76 26,10 5,47 31,54 100 %
77 26,53 5,44 31,37 101 %
78 27,06 5,51 31,76 102 %
79 27,01 5,31 31,56 102 %
80 27,56 5,31 31,68 103 %
81 27,86 5,57 31,79 105 %
82 28,17 5,59 31,76 106 %
83 28,60 5,67 31,74 107 %
84 29,03 5,64 32,18 107 %
85 29,15 5,88 31,82 110 %
86 29,67 5,96 32,35 110 %
87 30,20 5,98 32,47 111 %

 

 Here again the user and system times grow linearly.


And again the total time is rather constant. The total time is multiplied by 1.5 while the number of cards goes from 1 to 88.

 

The CPU load is also growing linearly. The system has a 4-core CPU so it is not surprising to get more than 100% of CPU usage.

My sample test is not optimized for speed or CPU load at all. I use make -j to start one Python program usim_read.py per slot. So make has to start 88 Python processes in the case of 88 slots.
The goal was to use standard and simple tools.

I stopped at 88 slots instead of the expected 96 because one of the 12 sysmoOCTSIM reader (part of the sysmoSIMBANK 96 reader) was not working correctly at the time.


Conclusion

A big thank to Sysmocom for helping my work on this code.

I am very happy to see pcsc-lite and my CCID driver able to handle 88 APDU exchanges at the same time.