I've been experimenting (from the CLI) with the encryption examples at https://www.php.net/manual/en/function.openssl-encrypt.php and would like to know how to handle a larger file, such as a SQLite database file of a few GB. I was getting a memory allocation error before following this example on that same PHP page.
I was slow to catch on; but finally got it to work on a 3 GB SQLite database. I think it was that base64 is the default if set options=0 and I was trying to use the newest example provided for the simple case along with the old example for a large file that uses options of OPENSSL_RAW_DATA.
My question is, Is this approach given in that eight-year-old post still the proper approach. Some of the examples give a before and after version 7.1.
Is it okay to use the first 16 bytes of an encrypted block as the next initialization vector?
Is the number of blocks per encryption/decryption important? Should one use the largest size that will fit in the maximum memory allocation or something else?
<?php
//$key should have been previously generated in a cryptographically safe way, like openssl_random_pseudo_bytes
$key = "abcd1234dcba4321";
$cipher = "aes-256-cbc";
$feBlocks = 10000;
$fpPlain = fopen("../Database/filename.db",'rb');
$fpEncrypt = fopen("encrypted.enc", 'wb');
if (in_array($cipher, openssl_get_cipher_methods()))
{
$ivlen = openssl_cipher_iv_length($cipher);
$iv = openssl_random_pseudo_bytes($ivlen);
// Put the initialzation vector to the beginning of the file
fwrite($fpEncrypt, $iv);
while (!feof($fpPlain)) {
$plaintext = fread($fpPlain, $ivlen * $feBlocks);
$ciphertext = openssl_encrypt($plaintext, $cipher, $key, $options=OPENSSL_RAW_DATA, $iv);
// Use the first 16 bytes of the ciphertext as the next initialization vector
$iv = substr($ciphertext, 0, $ivlen);
fwrite($fpEncrypt, $ciphertext);
}
fclose($fpPlain);
fclose($fpEncrypt);
}
$fpEncrypt = fopen("encrypted.enc", 'rb');
$fpPlain = fopen("decrypted.db",'wb');
$ivlen = openssl_cipher_iv_length($cipher);
// Get the initialzation vector from the beginning of the file
$iv = fread($fpEncrypt, $ivlen);
while (!feof($fpEncrypt)) {
// we have to read one block more for decrypting than for encrypting
$ciphertext = fread($fpEncrypt, $ivlen * ($feBlocks+1));
$plaintext = openssl_decrypt($ciphertext, $cipher, $key, $options=OPENSSL_RAW_DATA, $iv);
// Use the first 16 bytes of the ciphertext as the next initialization vector
$iv = substr($ciphertext, 0, $ivlen);
fwrite($fpPlain, $plaintext);
}
fclose($fpEncrypt);
fclose($fpPlain);
Some of the examples give a before and after version 7.1.
The examples from the openssl_encrypt()
documentation implement authenticated encryption (AEAD), once as AES/CBC with an HMAC (before v7.1, Example #2) and once with GCM (after v7.1, Example #1), but as one-step encryption. In addition to confidentiality, AEAD also guarantees authenticity.
However, a truly robust custom implementation of chunkwise AEAD is not trivial (for the requirements, see for instance the listing in the first section of Encrypted streams and file encryption) of the Libsodium documentation), so it is better to use a reliable library. One option for PHP is Sodium, a Libsodium wrapper. Libsodium/Sodium has a streaming API, see Encrypted streams and file encryption and the sodium_crypto_secretstream_...
functions.
Is this approach given in that eight-year-old post still the proper approach?
Your code (or the underlying code from the example) implements chunk-wise encryption with AES/CBC. The code could be optimized, but in principle it is OK (as long as AEAD is not required for your requirements).
Regarding the optimizations: The encryption in the current code pads all plaintext chunks and uses the first block of a ciphertext chunk as the IV for the next plaintext chunk. This means that the chunk size from the encryption must be known during decryption so that the same ciphertext chunks can be reconstructed. In addition, the padding of the intermediate chunks is inefficient.
It would make more sense to decouple the chunk sizes for encryption and decryption. This can be achieved if:
With these requirements, the encryption provides the same ciphertext that an one-step encryption would provide, as can best be seen from the CBC diagram. The necessary code changes are minor.
One could come up with the idea of extending the code with a MAC to an AEAD. This is possible, but as already mentioned above, it is better to consider an established library for security reasons.
Is it okay to use the first 16 bytes of an encrypted block as the next initialization vector?
In principle, yes, but with regard to the optimization mentioned above, it would make more sense to use the last block of a ciphertext chunk as the IV for the next plaintext chunk.
Is the number of blocks per encryption/decryption important? Should one use the largest size that will fit in the maximum memory allocation or something else?
With your implementation, there is the constraint that the chunk size for decryption depends on that for encryption, which is eliminated with the optimization.
Apart from this, the selected chunk size will have an impact on performance. I would use a preferably large chunk size (but test this out).