[SOLVED] Number of independent AES 256 CBC decryption operations per second with AES-NI or GPU acceleration

Number of independent AES 256 CBC decryption operations per second with AES-NI or GPU acceleration

AES-NI seems to be optimized to encrypt/decrypt big chunks of data. However I'm trying to decrypt a password and I have many very small bits to try (iv + first cbc block, 32 bytes in total).

I'm using openssl at the moment, calling EVP_DecryptInit_ex, EVP_DecryptUpdate for every cycle (and EVP_CIPHER_CTX_init once per thread).

I can do this around 2 million times per second on a single core.

I assume this is the sort of performance I can expect using AES-NI instructions and I shouldn't worry about optimising this further. Is this correct?

Does anyone have any idea how much faster this might be on a high end GPU or not-too-expensive FPGA?

Solution

FPGA: You can convert an input block to an output block on any reasonable FPGA with a 2-cycle throughput at several hundred MHz, with a latency of 16 cycles. So, possibly 256 Mblocks/s pipelined, or maybe 32 Mblocks/s not pipelined. You could get maybe 5 of these on a reasonably cheap FPGA, or 30+ on an expensive one. YMMV.