scriptingbinary-datapatchfileparsing

What language is to binary, as Perl is to text?


I am looking for a scripting (or higher level programming) language (or e.g. modules for Python or similar languages) for effortlessly analyzing and manipulating binary data in files (e.g. core dumps), much like Perl allows manipulating text files very smoothly.

Things I want to do include presenting arbitrary chunks of the data in various forms (binary, decimal, hex), convert data from one endianess to another, etc. That is, things you normally would use C or assembly for, but I'm looking for a language which allows for writing tiny pieces of code for highly specific, one-time purposes very quickly.

Any suggestions?


Solution

  • Things I want to do include presenting arbitrary chunks of the data in various forms (binary, decimal, hex), convert data from one endianess to another, etc. That is, things you normally would use C or assembly for, but I'm looking for a language which allows for writing tiny pieces of code for highly specific, one-time purposes very quickly.

    Well, while it may seem counter-intuitive, I found erlang extremely well-suited for this, namely due to its powerful support for pattern matching, even for bytes and bits (called "Erlang Bit Syntax"). Which makes it very easy to create even very advanced programs that deal with inspecting and manipulating data on a byte- and even on a bit-level:

    Since 2001, the functional language Erlang comes with a byte-oriented datatype (called binary) and with constructs to do pattern matching on a binary.

    And to quote informIT.com:

    (Erlang) Pattern matching really starts to get fun when combined with the binary type. Consider an application that receives packets from a network and then processes them. The four bytes in a packet might be a network byte-order packet type identifier. In Erlang, you would just need a single processPacket function that could convert this into a data structure for internal processing. It would look something like this:

    processPacket(<<1:32/big,RestOfPacket>>) ->
        % Process type one packets
        ...
    ;
    processPacket(<<2:32/big,RestOfPacket>>) ->
        % Process type two packets
        ...
    

    So, erlang with its built-in support for pattern matching and it being a functional language is pretty expressive, see for example the implementation of ueencode in erlang:

    uuencode(BitStr) ->
    << (X+32):8 || <<X:6>> <= BitStr >>.
    uudecode(Text) ->
    << (X-32):6 || <<X:8>> <= Text >>.
    

    For an introduction, see Bitlevel Binaries and Generalized Comprehensions in Erlang.You may also want to check out some of the following pointers: