c++openssldigital-signaturedsa

What is the proper way to efficiently create digital signatures? Can I use DSA_sign_setup()?


I am working on an application whose performance is critical.

In this application I have a lot of messages(i.e. several thousands) needed to be signed (and verified of course) separately with a same private key/public key. I am using the OpenSSL library.

A naive approach with DSA functions (see below) will take tens of seconds to sign which is not nice. I tried to useDSA_sign_setup() function to speed things up but I can't figure out the correct way to use it.

I also tried ECDSA but I am lost in getting the correct configuration.

What is the proper way to do this if I really care about efficiency?

#include <openssl/dsa.h>
#include <openssl/engine.h>
#include <stdio.h>
#include <openssl/evp.h>

int N=3000;

int main()
{
    DSA *set=DSA_new();
    int a;
    a=DSA_generate_parameters_ex(set,1024,NULL,1,NULL,NULL,NULL);
    printf("%d\n",a);
    a=DSA_generate_key(set);
    printf("%d\n",a);
    unsigned char msg[]="I am watching you!I am watching you!";
    unsigned char sign[256];
    unsigned int size;
    for(int i=0;i<N;i++)
        a=DSA_sign(1,msg,32,sign,&size,set);
    printf("%d %d\n",a,size);
}

Solution

  • Using DSA_sign_setup() in the way proposed above is actually completely insecure, and luckily OpenSSL developers made the DSA structure opaque so that developers cannot try to force their way.

    DSA_sign_setup() generates a new random nonce (that is sort of an ephemeral key per signature). It should never be reused under the same long term secret key. Never.

    You could theoretically still be relatively safe reusing the same nonce for the same message, but as soon as the same combination of private key and nonce gets reused for two different messages you just reveal all the information that an attacker needs to retrieve your secret key (see Sony fail0verflow which is basically due to doing the same mistake of reusing the nonce with ECDSA).

    Unfortunately DSA is slow, especially now that longer keys are required: to speed up your application you could try using ECDSA (e.g. with curve NISTP256, still no nonce reuse) or Ed25519 (deterministic nonce).


    Proof of concept using the EVP_DigestSign API

    Update: here is a proof of concept of how to programmatically generate signatures with OpenSSL. The preferred way is to use the EVP_DigestSign API as it abstracts away which kind of asymmetric key is being used.

    The following example expands the PoC in this OpenSSL wiki page: I tested this works using a DSA or NIST P-256 private key, with OpenSSL 1.0.2, 1.1.0 and 1.1.1-pre6.

    #include <stdio.h>
    #include <string.h>
    #include <errno.h>
    #include <openssl/pem.h>
    #include <openssl/err.h>
    #include <openssl/evp.h>
    
    #define KEYFILE "private_key.pem"
    #define N 3000
    #define BUFFSIZE 80
    
    EVP_PKEY *read_secret_key_from_file(const char * fname)
    {
        EVP_PKEY *key = NULL;
        FILE *fp = fopen(fname, "r");
        if(!fp) {
            perror(fname); return NULL;
        }
        key = PEM_read_PrivateKey(fp, NULL, NULL, NULL);
        fclose(fp);
        return key;
    }
    
    int do_sign(EVP_PKEY *key, const unsigned char *msg, const size_t mlen,
                unsigned char **sig, size_t *slen)
    {
        EVP_MD_CTX *mdctx = NULL;
        int ret = 0;
    
        /* Create the Message Digest Context */
        if(!(mdctx = EVP_MD_CTX_create())) goto err;
    
        /* Initialise the DigestSign operation - SHA-256 has been selected
         * as the message digest function in this example */
        if(1 != EVP_DigestSignInit(mdctx, NULL, EVP_sha256(), NULL, key))
            goto err;
    
        /* Call update with the message */
        if(1 != EVP_DigestSignUpdate(mdctx, msg, mlen)) goto err;
    
        /* Finalise the DigestSign operation */
        /* First call EVP_DigestSignFinal with a NULL sig parameter to
         * obtain the length of the signature. Length is returned in slen */
        if(1 != EVP_DigestSignFinal(mdctx, NULL, slen)) goto err;
        /* Allocate memory for the signature based on size in slen */
        if(!(*sig = OPENSSL_malloc(*slen))) goto err;
        /* Obtain the signature */
        if(1 != EVP_DigestSignFinal(mdctx, *sig, slen)) goto err;
    
        /* Success */
        ret = 1;
    
    err:
        if(ret != 1)
        {
            /* Do some error handling */
        }
    
        /* Clean up */
        if(*sig && !ret) OPENSSL_free(*sig);
        if(mdctx) EVP_MD_CTX_destroy(mdctx);
    
        return ret;
    }
    
    int main()
    {
        int ret = EXIT_FAILURE;
        const char *str = "I am watching you!I am watching you!";
        unsigned char *sig = NULL;
        size_t slen = 0;
        unsigned char msg[BUFFSIZE];
        size_t mlen = 0;
    
        EVP_PKEY *key = read_secret_key_from_file(KEYFILE);
        if(!key) goto err;
    
        for(int i=0;i<N;i++) {
            if ( snprintf((char *)msg, BUFFSIZE, "%s %d", str, i+1) < 0 )
                goto err;
            mlen = strlen((const char*)msg);
            if (!do_sign(key, msg, mlen, &sig, &slen)) goto err;
            OPENSSL_free(sig); sig = NULL;
            printf("\"%s\" -> siglen=%lu\n", msg, slen);
        }
    
        printf("DONE\n");
        ret = EXIT_SUCCESS;
    
    err:
        if (ret != EXIT_SUCCESS) {
            ERR_print_errors_fp(stderr);
            fprintf(stderr, "Something broke!\n");
        }
    
        if (key)
            EVP_PKEY_free(key);
    
        exit(ret);
    }
    

    Generating a key:

    # Generate a new NIST P-256 private key
    openssl ecparam -genkey -name prime256v1 -noout -out private_key.pem
    

    Performance/Randomness

    I ran both your original example and my code on my (Intel Skylake) machine and on a Raspberry Pi 3. In both cases your original example does not take tens of seconds. Given that apparently you see a huge performance improvement in using the insecure DSA_sign_setup() approach in OpenSSL 1.0.2 (which internally consumes new randomness, in addition to some somewhat expensive modular arithmetic), I suspect you might actually have a problem with the PRNG that is slowing down the generation of new random nonces and has a bigger impact than the modular arithmetic operations. If that's the case you might definitely benefit from using Ed25519 as in that case the nonce is deterministic rather than random (it's generated using secure hash functions and combining the private key and the message). Unfortunately that means that you will need to wait until OpenSSL 1.1.1 is released (hopefully during this summer).

    On Ed25519

    To use Ed25519 (which will be supported natively starting with OpenSSL 1.1.1) the above example needs to be modified, as in OpenSSL 1.1.1 there is no support for Ed25519ph and instead of using the Init/Update/Final streaming API you would need to call the one-shot EVP_DigestSign() interface (see documentation).

    Full disclaimer: the next paragraph is a shameless plug for my libsuola research project, as I could definitely benefit from testing for real-world applications from other users.

    Alternatively, if you cannot wait, I am the developer of an OpenSSL ENGINE called libsuola that adds support for Ed25519 in OpenSSL 1.0.2, 1.1.0 (and also 1.1.1 using alternative implementations). It's still experimental, but it uses third-party implementations (libsodium, HACL*, donna) for the crypto part and so far my testing (for research purposes) has not yet revealed outstanding bugs.

    Benchmarking comparison of OP original example and mine

    To address some of the comments, I compiled and executed OP's original example, a slightly modified version fixing some bugs and memory leaks, and my example of how to use the EVP_DigestSign API, all compiled against OpenSSL 1.1.0h (compiled as a shared library to a custom prefix from the release tarball with default configuration parameters).

    The full details can be found at this gist, which includes the exact versions I benchmarked, the Makefile containing all the details on how the examples where compiled and how the benchmark was run, and details about my machine (briefly it's a quad-core i5-6500 @ 3.20GHz, and freq scaling/Turbo boost is disabled from software and from the UEFI).

    As can be seen from make_output.txt:

    Running ./op_example
    time ./op_example >/dev/null
    0.32user 0.00system 0:00.32elapsed 100%CPU (0avgtext+0avgdata 3452maxresident)k
    0inputs+0outputs (0major+153minor)pagefaults 0swaps
    
    Running ./dsa_example
    time ./dsa_example >/dev/null
    0.42user 0.00system 0:00.42elapsed 100%CPU (0avgtext+0avgdata 3404maxresident)k
    0inputs+0outputs (0major+153minor)pagefaults 0swaps
    
    Running ./evp_example
    time ./evp_example >/dev/null
    0.12user 0.00system 0:00.12elapsed 99%CPU (0avgtext+0avgdata 3764maxresident)k
    0inputs+0outputs (0major+157minor)pagefaults 0swaps
    

    This shows that using ECDSA over NIST P-256 through the EVP_DigestSign API is 2.66x faster than the original OP's example and 3.5x faster than the corrected version.

    As a late additional note, the code in this answer also computes the SHA256 digest of the input plaintext, while OP's original code and the "fixed" version skip it. Therefore the speedup demonstrated by the ratios reported above is even more significant!


    TL;DR: The proper way to efficiently use digital signatures in OpenSSL is through the EVP_DigestSign API: trying to use DSA_sign_setup() in the way proposed above is ineffective in OpenSSL 1.1.0 and 1.1.1, and is wrong (as in completely breaking the security of DSA and revealing the private key) in ≤1.0.2. I completely agree that the DSA API documentation is misleading and should be fixed; unfortunately the function DSA_sign_setup() cannot be completely removed as minor releases must retain binary compatibility, hence the symbol needs to stay there even for the upcoming 1.1.1 release (but is a good candidate for removal in the next major release).