phpfile-iobinaryfilesmagic-numberssfx

How do you get chunks of binary files in PHP?


I'm creating a PHP app that at some point will download an SFX archive from a website and needs to extract the data from it.

Since I am running this on a Linux box, I need to chop off the SFX executable portion of the file and save the compressed file on the filesystem, which I will then run a program to unzip/extract. (SFX archives are basically an EXE file with the compressed archive tacked on after it. I have tried this manually with a hex editor and whatnot and it works just fine.)

The file type of the compressed archive within the SFX archive will always be the same, and I know what the magic number is for that file type.

What I need to do then in PHP is, after downloading the file (let's assume a simple file_get_contents() using a URL parameter) and it is sitting in memory, I need to extract the data from the contents starting at the magic number of the compressed archive.

I was thinking I could maybe do some sort of regex method, however, I need to process this as binary information (the magic number will need to be expressed as hex) and not character data. The magic number itself contains hex values that are non-printing/do not show up as any readable character.


Solution

  • Regexes are binary-safe. However you might be better off with strpos.

    $magicpos = strpos($downloaded_data,"\x1a\x09\x01");
    

    That assumes the magic number is 0x1A 0x09 0x01 - you can replace it with whatever the number actually is. Then:

    $archive = substr($downloaded_data,$magicpos);
    

    This will get the archive data from the magic number (included) onwards.