Important!

Blog moved to https://blog.apdu.fr/

I moved my blog from https://ludovicrousseau.blogspot.com/ to https://blog.apdu.fr/ . Why? I wanted to move away from Blogger (owne...

Friday, May 8, 2020

Your PC/SC application 200% faster

Dana Keeler reported an issue on pcsc-lite at "SYS_USleep in SCardEndTransaction in winscard_clnt.c causing slowness in Firefox".

The problem was initially reported on Firefox "firefox is very slow and crashes because of p11-kit-proxy.so" and also on p11-kit "firefox is very slow and crashes because of p11-kit-proxy.so".

The issue

The pcsc-lite performance issue comes from this piece of code in SCardEndTransaction:

 /*
  * This helps prevent starvation
  */
 randnum = SYS_RandomInt(1000, 10000);
 (void)SYS_USleep(randnum);

For each call to SCardEndTransaction() you will get a delay between 1 ms and 11 ms. So on average you get 6 ms of delay each time.

Performance measures

I wrote a Python program to get the duration of 100 calls to SCardEndTransaction():

#! /usr/bin/env python3

from smartcard.scard import *
from smartcard.pcsc.PCSCExceptions import *
from time import time

hresult, hcontext = SCardEstablishContext(SCARD_SCOPE_USER)
if hresult != SCARD_S_SUCCESS:
    raise EstablishContextException(hresult)

hresult, readers = SCardListReaders(hcontext, [])
if hresult != SCARD_S_SUCCESS:
    raise ListReadersException(hresult)
print('PC/SC Readers:', readers)
reader = readers[0]
print("Using reader:", reader)

hresult, hcard, dwActiveProtocol = SCardConnect(hcontext, reader, SCARD_SHARE_SHARED, SCARD_PROTOCOL_ANY)
if hresult != SCARD_S_SUCCESS:
    raise BaseSCardException(hresult)

nb = 100
total = 0
for i in range(nb):
    hresult = SCardBeginTransaction(hcard)
    if hresult != SCARD_S_SUCCESS:
        raise BaseSCardException(hresult)

    before = time()
    hresult = SCardEndTransaction(hcard, SCARD_LEAVE_CARD)
    if hresult != SCARD_S_SUCCESS:
        raise BaseSCardException(hresult)
    after = time()
    delta = after - before
    print(delta)
    total += delta
print("total: {} ms".format(total * 1000))
print("average: {} ms".format(total * 1000 / nb))

hresult = SCardDisconnect(hcard, SCARD_LEAVE_CARD)
if hresult != SCARD_S_SUCCESS:
    raise BaseSCardException(hresult)

hresult = SCardReleaseContext(hcontext)
if hresult != SCARD_S_SUCCESS:
    raise ReleaseContextException(hresult)

Results


As expected the random delay is of 6 ms (or 0.006 second) on average.

The history

I don't know why this delay is needed. I went into the previous versions of the source code file PCSC/src/winscard_clnt.c but the oldest version (from March 2002, 18 years ago) already has this delay.

The code was written David Corcoran, the initial author of pcsc-lite. It was my very early days in the pcsc-lite project at that time.

The solution

I don't see any good reasons to have a delay here. I guess it was to solve a problem with the communication between pcscd and libpcsclite at that time. The communication mechanism has been redesigned in version 1.6.0 (May 2010) and the delay should now be useless and even problematic as we saw.

My solution is then to remove the problematic code. The was done in this commit.

The results

I used again my Python program to measure the performances of SCardEndTransaction(). I now have:

The mean delay is 9.5x10-6 s so 9 µs or 0.009 ms or 0.000009 second. The speedup factor is huge: x647.
The function is now 600 times faster than before.

You can note a high decrease on the 4 first values. My guess is that it is an effect of CPU memory caches in action. I have not investigated this point. This is left as an exercise for my readers.

Macro benchmark

It is nice to have a huge improvement in one PC/SC function but does that help real applications?

For a real smart card application I used OpenSC with a standard command: list all the objects of a smart card.
I used a very simple shell script:

#!/bin/bash

for i in {1..100}
do
 pkcs11-tool --list-objects
done

Slow smart card

First I used a very old smart card I have in my collection: a Gemplus GPK 8000 card.
Since the card is very old and slow the shell script will do 10 rounds instead of 100.

whattime (s)
Before23.34
After22.84

Speedup: x1.02 or 2%

The gain is very limited. This is because the card is so slow that the benefit from the new SCardEndTransaction() is negligible.

Fast smart card

I then used a much faster smart card: a Yubikey 5 from Yubico. It is not a real smart card but a token with a CCID interface and a chip that understand APDU commands.
Since the device is fast I used 100 rounds of pkcs11-tool.

whattime (s)
Before9.85
After3.02

Speedup: x3.26 or 226%

This time the gain is highly visible. The pkcs11-tool command is now 3 times faster.

Results

You will get a high acceleration with fast smart card.
I was able to get a full execution of pkcs11-tool 3 (three) times faster than before the change.

I was really impressed by this result. For almost 10 years pcsc-lite was slow and could be improved by just removing 2 lines of code.

Potential regression?

I don't think this change will create a regression and will break existing code.

I made a beta version of pcsc-lite including the change available at http://ludovic.rousseau.free.fr/softwares/pcsc-lite/pcsc-lite-1.8.26-05d48e5.tar.bz2.
Please test this version with your PC/SC application. In case of a problem with PC/SC transactions then open an issue at Salsa https://salsa.debian.org/rousseau/PCSC/-/issues or github https://github.com/LudovicRousseau/PCSC/issues.

I plan to wait a few weeks to get potential feedback before I make a new release of pcsc-lite.

Conclusion

pcsc-lite is now much faster.

This happened because someone complained that Firefox was slow and someone at Mozilla investigated the issue to find a problem in pcsc-lite.

I am very happy to use Free Software programs where it is possible and easy to find problems in software you use.