phpregexstrtr

Parsing bytes of a binary file in PHP and translate groups into a placeholder


I could use some advice - I'm parsing a binary file in php, to be specific, it's a Sega Genesis rom-file. According to the table I have made, certain bytes correspond to characters or control different stuff with the game's text-engine.

There are bytes, which are used for characters as well as "controller"-bytes, for line-breaks, conditions, color and a bunch of other stuff, so a typical sentence will probably look like this:

FC 03 E7 05 D3 42 79 20 64 6F 69 6E 67 20 73 6F 2C BC BE 08 79 6F 75 20 6A 75 73 74 20 61 63 71 75 69 72 65 64 BC BE 04 61 20 74 65 73 74 61 6D 65 6E 74 20 74 6F 20 79 6F 75 72 BC 73 74 61 74 75 73 20 61 73 20 61 20 77 61 72 72 69 6F 72 21 BD BC

which I can translate to:

<FC><03><E7><05><D3>By doing so,<NL><BE><08>you just acquired<NL><BE><04>a testament to your<NL>status as a warrior!<CURSOR>

I want to specify properties for such a controller-byte-string such as length and write my own values to certain positions..

See, bytes that translate into characters (00 to 7F) or line-breaks (BC) only consist of a single byte while others consist of 2 (BE XX). Conditions (FC) even consist of 5 bytes: FC XX YY (where X and Y refer to offsets which I need to calculate while I put my translated strings together)

I want my parser to recognize such bytes and let me write XX YY dynamicly. Using strtr I can only replace "groups" e.g. when I put the static bytestring into an array.

How would you do this while keeping the parser flexible? Thanks!


Solution

  • Assuming you have your hex values available as string, you can use this regex to parse it like you've mentioned. If you identify more rules other than FC**** or BE** then you can directly add them to the below regex so that they are also extracted.

    (?<fc>FC(\w\w){4})|(?<be>BE(\w\w))|(?<any>(\w\w))
    

    Now using named groups fc, be, any to identify result set easily using arrays such as $matches['fc'].

    Regex Demo: https://regex101.com/r/kR9kdP/5

    $re = '/(?<fc>FC(\w\w){4})|(?P<be>BE(\w\w))|(?P<any>(\w\w))/';
    $str = 'FC03E705D3FC0006042842616D20626162612062';
    
    preg_match_all($re, $str, $matches, PREG_PATTERN_ORDER, 0);
    
    // Print the entire match result
    print_r(array_filter($matches['fc']));  // Returns an array with all FC****
    print_r(array_filter($matches['be']));  // Returns an array with all BE**
    print_r(array_filter($matches['any'])); // Returns rest **
    

    PHP Demo: http://ideone.com/qWUaob

    Sample Results:

    Array
    (
        [0] => FC03E705D3
        [1] => FC00060428
    )
    Array
    (
        [50] => BE08
        [59] => BE04
        [113] => BE08
        [132] => BE04
    )
    

    Hope this helps!