[SOLVED] PDF and DOCX Magic Numbers

PDF and DOCX Magic Numbers

I read the first byte to differentiate file types but both PDF and DOCX has a "0x50" magic number. How do I handle this circumstance?

Solution

PDF files don't have a "magic" byte they start with. If you read the PDF specification you'll see they have to start with "%PDF", but in practice many PDF files do not.

Just looking for a %PDF header to identify PDF files is highly unreliable, a valid PDF file is a file you can parse (that at least has a trailer, cross-reference table and so forth).
There was a suggestion once that PDF files contain binary data before the %PDF header to make sure they were treated as binary files. As a result PDF readers at one point started accepting a certain number of binary bytes (random bytes) before the %PDF header. Such files cannot be detected by a simple magic number or string of magic numbers.