pythonaesecb

Crypto attack on AES ECB


This is a school challenge. Given an encrypted file (which is an audio file) I need to figure out how I can decrypt it. Don't have a key, of course.

Now the clue is that I have a small part of the original plain audio file, but I do not know from which part of the encrypted file it is aligned to. I also know the byte size used for encryption is 16 bytes. So my questions are, if you can give me the ideas:

  1. I believe trying to break the key by plaintext attack or bit flipping on the full encrypted file would be too much.
  2. How can I compare the 16-byte blocks in the small plain file with the blocks in the encrypted file, to spot the pattern that can help me figure out the key used for encryption? How do I even get the key even if I know "these 16-bytes in the plain file are these 16-bytes in the encrypted file"? How does that even help? But there is some point to it otherwise the challenge would not have given it.
  3. Is there any other way to approach this? Thank you.

Solution

  • Given a window of 20kb, break it into 16 chunks. If the 20kb of plaintext is not exactly a multiple of 16 bytes, truncate the end of it. Generate a vector of length 20kb/16 = 1280 numbers as follows: assign 0 to the first block. Store it in a dictionary keyed by the block and whose value is the index 0. Move on to the next block. If it is the same as the first block -- assign 0 to it. Otherwise assign 1 to it and create a second dictionary entry. Iterate over the blocks. For each block you check if it is in the dictionary. If it is -- assign to it the value of this entry. Otherwise increment the last index used and create a new dictionary entry.

    The result will look something like [0, 1, 2, 3, 1, 4, 2, 5, 6, 7, ...]. This can be thought of as a statistical profile of the plaintext.

    Now have a sliding window of width 20kb move over the ciphertext. For each window, compute its statistical profile (perhaps it can be updated rather that computed from scratch as the window slides). Eventually you will get a window with the same profile as the plaintext. You now know (with high probability) where the plaintext is within the ciphertext. This will give you up to 20kb of known plaintext-ciphertext blocks. Hopefully that will be enough.

    The basic logic can be understood in the context of simple substitution ciphers. The word MISSISSIPPI would have profile 0 1 2 2 1 2 2 1 3 3 1. If it were encrypted with a substitution cipher it might map to e.g. XQTTQTTQKKQ -- which has exactly the same profile. If you knew that a given ciphertext contained an encryption of MISSISSIPPI, you could slide through the ciphertext in windows of width 11 until you found a window with the right profile. At that point you would know the ciphertext equivalents of MISP, which will give a partial break into the system. AES ECB is basically just a substitution cipher (albeit one on a much larger alphabet) so the same basic logic applies.