Last year in "Accessing a lot of smart cards?" I wrote about accessing many smart cards in parallel.
One of my test platform was the sysmoOCTSIM, an 8-slots reader I presented in
"sysmoOCTSIM: 8 slots reader". One advantage of this reader is that the 8 slots can be used at the same
time. But my CCID driver did not support simultaneous access of different
slots of the same reader.
Extract from the previous article (March 2021):
sysmoOCTSIM
My CCID driver for Unix do support multi-slot readers. But only one slot can
be used at the same time. It is a limitation of the driver.
Supporting accesses to 2 or more slots in parallel would imply a change from
synchronous USB communication to asynchronous USB communication. That is a
possible change but not an easy one.
Results
number of slots |
sequential exe |
parallel exe |
1 |
5.126s |
5.126s |
2 |
10.273s |
10.030s |
3 |
15.321s |
14.944s |
You may note that in the case of parallel execution we have a linear growth.
As I explained before only one slot can be used at the same time. So
pcsc-lite (the PC/SC resource manager) has to serialize the accesses to the
different slots from the different executions.
The parallel execution is a bit more efficient than the sequential execution
because part of the execution can be executed in parallel. But not so
much.
Problem fixed
My CCID driver now (since version 1.5.0) has support of simultaneous
access to the slots of a reader.
But not all multi-slots readers can support simultaneous access. The reader
must declare that all the slots can be used the same time. The USB descriptor
field bMaxCCIDBusySlots
must have a value greater than 1. Ideally
this value should correspond to the number of slots. My CCID driver enables
simultaneous access only if bMaxCCIDBusySlots
correspond to the
number of slots i.e. bMaxSlotIndex +1
.
Readers that should support this feature:
Not so many readers will benefit from this improvement. They are:
Performances
So what are the performances now?
With the sysmoOCTSIM 8-slots reader I now get:
# |
User |
Sys |
Clock
|
CPU |
0 |
0,07 |
0,02 |
24,65 |
0 % |
1 |
0,19 |
0,04 |
24,72 |
0 % |
2 |
0,21 |
0,05 |
24,68 |
1 % |
3 |
0,26 |
0,08 |
24,59 |
1 % |
4 |
0,34 |
0,09 |
24,67 |
1 % |
5 |
0,41 |
0,10 |
24,65 |
2 % |
6 |
0,50 |
0,11 |
24,64 |
2 % |
7 |
0,56 |
0,13 |
24,72 |
2 % |
I used the GNU time command to measure the User, System and clock times.
As expected the user (and system) time grows with the number of cards (slots)
used.
Also as expected the clock time is rather constant to 24.6 seconds in all
cases instead of growing linearly as it was in the case in "Accessing a lot of smart cards?".
We can clearly see the effect of the simultaneous accesses here.
Results with 88 slots
I got (remote) access to a sysmoSIMBANK 96 with 96 slots. See "A reader for 96 smart cards? sysmoSIMBANK" for more details about the reader.
Performances
# |
User |
Sys |
Clock |
CPU |
0 |
0,23 |
0,08 |
21,00 |
1 % |
1 |
0,52 |
0,11 |
24,27 |
2 % |
2 |
0,77 |
0,23 |
24,30 |
4 % |
3 |
1,17 |
0,25 |
24,34 |
5 % |
4 |
1,51 |
0,28 |
24,41 |
7 % |
5 |
1,43 |
0,33 |
24,44 |
7 % |
6 |
1,81 |
0,37 |
24,50 |
8 % |
7 |
2,12 |
0,48 |
24,88 |
10 % |
8 |
2,53 |
0,51 |
25,00 |
12 % |
9 |
2,90 |
0,57 |
25,07 |
13 % |
10 |
3,15 |
0,75 |
25,06 |
15 % |
11 |
3,59 |
0,79 |
25,15 |
17 % |
12 |
3,94 |
0,85 |
25,24 |
19 % |
13 |
4,37 |
0,85 |
25,30 |
20 % |
14 |
4,84 |
0,92 |
25,33 |
22 % |
15 |
5,19 |
0,98 |
25,28 |
24 % |
16 |
5,56 |
1,10 |
25,41 |
26 % |
17 |
5,89 |
1,20 |
25,51 |
27 % |
18 |
6,41 |
1,25 |
25,55 |
30 % |
19 |
6,75 |
1,32 |
25,58 |
31 % |
20 |
7,14 |
1,41 |
25,58 |
33 % |
21 |
7,49 |
1,50 |
25,46 |
35 % |
22 |
7,88 |
1,58 |
25,75 |
36 % |
23 |
8,16 |
1,64 |
25,65 |
38 % |
24 |
8,75 |
1,70 |
25,80 |
40 % |
25 |
9,00 |
1,79 |
25,91 |
41 % |
26 |
9,35 |
1,88 |
25,88 |
43 % |
27 |
9,76 |
1,95 |
25,84 |
45 % |
28 |
10,26 |
1,99 |
25,51 |
48 % |
29 |
10,58 |
2,09 |
25,95 |
48 % |
30 |
10,99 |
2,16 |
26,26 |
50 % |
31 |
11,21 |
2,29 |
25,97 |
51 % |
32 |
11,56 |
2,42 |
26,23 |
53 % |
33 |
12,00 |
2,40 |
26,06 |
55 % |
34 |
12,48 |
2,43 |
26,38 |
56 % |
35 |
13,07 |
2,41 |
26,67 |
58 % |
36 |
13,23 |
2,69 |
26,23 |
60 % |
37 |
13,63 |
2,70 |
26,30 |
62 % |
38 |
13,90 |
2,88 |
26,04 |
64 % |
39 |
14,55 |
2,69 |
26,57 |
64 % |
40 |
14,85 |
2,83 |
26,43 |
66 % |
41 |
15,17 |
2,94 |
25,71 |
70 % |
42 |
15,47 |
3,05 |
26,48 |
69 % |
43 |
15,88 |
3,12 |
26,35 |
72 % |
44 |
16,32 |
3,29 |
26,27 |
74 % |
45 |
16,66 |
3,23 |
26,67 |
74 % |
46 |
17,29 |
3,25 |
26,69 |
76 % |
47 |
17,28 |
3,56 |
26,48 |
78 % |
48 |
17,88 |
3,50 |
26,69 |
80 % |
49 |
18,31 |
3,54 |
26,78 |
81 % |
50 |
18,74 |
3,59 |
27,16 |
82 % |
51 |
18,62 |
3,66 |
26,10 |
85 % |
52 |
19,01 |
3,60 |
26,74 |
84 % |
53 |
19,29 |
3,88 |
26,20 |
88 % |
54 |
19,29 |
3,85 |
26,95 |
85 % |
55 |
19,81 |
3,74 |
26,33 |
89 % |
56 |
20,10 |
3,94 |
26,66 |
90 % |
57 |
20,28 |
4,17 |
26,73 |
91 % |
58 |
20,80 |
4,05 |
27,09 |
91 % |
59 |
21,02 |
4,02 |
26,39 |
94 % |
60 |
21,04 |
4,23 |
29,14 |
86 % |
61 |
21,56 |
4,28 |
29,18 |
88 % |
62 |
21,55 |
4,23 |
29,22 |
88 % |
63 |
21,78 |
4,34 |
29,19 |
89 % |
64 |
22,01 |
4,65 |
29,36 |
90 % |
65 |
22,67 |
4,54 |
29,41 |
92 % |
66 |
22,74 |
4,78 |
29,60 |
92 % |
67 |
23,65 |
4,58 |
29,51 |
95 % |
68 |
24,07 |
4,53 |
30,83 |
92 % |
69 |
24,26 |
4,64 |
30,88 |
93 % |
70 |
23,88 |
5,07 |
30,95 |
93 % |
71 |
24,35 |
4,98 |
31,06 |
94 % |
72 |
24,93 |
4,89 |
31,18 |
95 % |
73 |
25,30 |
4,96 |
31,23 |
96 % |
74 |
25,51 |
5,26 |
31,14 |
98 % |
75 |
25,91 |
5,15 |
31,42 |
98 % |
76 |
26,10 |
5,47 |
31,54 |
100 % |
77 |
26,53 |
5,44 |
31,37 |
101 % |
78 |
27,06 |
5,51 |
31,76 |
102 % |
79 |
27,01 |
5,31 |
31,56 |
102 % |
80 |
27,56 |
5,31 |
31,68 |
103 % |
81 |
27,86 |
5,57 |
31,79 |
105 % |
82 |
28,17 |
5,59 |
31,76 |
106 % |
83 |
28,60 |
5,67 |
31,74 |
107 % |
84 |
29,03 |
5,64 |
32,18 |
107 % |
85 |
29,15 |
5,88 |
31,82 |
110 % |
86 |
29,67 |
5,96 |
32,35 |
110 % |
87 |
30,20 |
5,98 |
32,47 |
111 % |
Here again the user and system times grow linearly.
And again the total time is rather constant. The total time is multiplied by
1.5 while the number of cards goes from 1 to 88.
The CPU load is also growing linearly. The system has a 4-core CPU so it is
not surprising to get more than 100% of CPU usage.
My sample test is not optimized for speed or CPU load at all. I use make -j
to start one
Python program usim_read.py per slot. So make
has to start 88 Python processes
in the case of 88 slots.
The goal was to use standard and simple
tools.
I stopped at 88 slots instead of the expected 96 because one of the 12 sysmoOCTSIM reader (part of the sysmoSIMBANK 96 reader) was not working correctly at the time.
Conclusion
A big thank to Sysmocom for helping my work on this code.
I am very happy to see pcsc-lite and my CCID driver able to handle 88 APDU exchanges at the same time.