Equivalent of a multi-type-struct od output

I can use od when I want to dump the contents of a non-textual file to a terminal (or a text file) as human-readable values: I can peer into files with elements of various types - signed or unsigned integers, floating point or printable ASCII. (You can also have the data printed in various bases like hexadecimal or octal, hence the name, but that's not what I care about.)

The limitation is, that the input file is assumed to have a single, uniform data type. But - what if this is not the case? What if I have triplets of, say, a single-byte unsigned value, then a floating-point element of size 4 bytes, and then a signed integer element of size 2 bytes? i.e. in od terms, u1,f4,d2?

I would like to see a sequence of triplets of numbers of these types printed for me; with any reasonable convention of line-breaking and field-delimitation. Suppose I want to specify my struct/tuple format as in the above, i.e. comma-separated-od-style; but I'm flexible on the specifics of this.

Can I use the shell and common command-line tools to achieve this relatively painlessly?

Solution

The od command will accumulate multiple formats with a single -t option (e.g., -t u1f4d2 in your case), and output a line for each type requested. Since you have multiples of the same type, adding them to the -t option only adds redundant information, so we can just use the representative types. Attempting to generate some data like describe, you get something like the following, with a line of output for each requested type:

% echo "128 255 12 3.7 -12" | perl -ne "print pack("CCCfs", split)" | od -An -tu1f4d2
 128 255  12 205 204 108  64 244 255               // u1
  -1.4784717e+08  -6.0981913e+31        3.57e-43   // f4
    -128  -13044   27852   -3008     255           // d2

Unfortunately, it seems that od tries to apply the requested type for each line, and since in your example, the three unsigned bytes cause the floating-point value following them not to start on a word (32-bit) boundary, it can't decode the float correctly.

However, if your data packing matches word boundaries, then you can get pretty close. By inserting an additional unsigned byte after your triple:

% echo "128 255 12 255 3.7 -12" | perl -ne "print pack("CCCCfs", split)" | od -An -tu1f4d2
 128 255  12 255 205 204 108  64 244 255
  -1.8741855e+38             3.7      9.1819e-41  // we get the correct float
    -128    -244  -13107   16492     -12          // and signed short

With this scenario, we can get close to what you ask with some more shell magic

% echo "128 255 12 255 3.7 -12" | perl -ne "print pack("CCCCfs", split)" | od -An -tu1f4d2 | paste -sd '  \n' | awk '{ print $1, $2, $3, $12, $18 }'

128 255 12 3.7 -12

Decoding that command pipeline a bit:

Command	Description
`echo "128 255 12 255 3.7 -12"`	Create some data in the form requested (four unsigned bytes, float, and a signed short)
`perl -ne "print pack("CCCCfs", split)"`	write them as binary
`od -An -tu1u1u1u1fFdS`	decode the binary. `od` will write a line of output for each type requested: • decoded as unsigned bytes • decoded as floats • decoded as signed shorts
`paste -sd ' \n'`	combine the three lines together
`awk '{ print $1,$2,$3,$12,$18 }'`	print the selected fields from the space-separated output

awk is just one option for isolating the fields you're looking for.

If you need to do this for multiple structures of the same size you can use a combination of od's -N (number of bytes to read) and -w (number of bytes of width to print) fields (with the limitation that the bytes read must be evenly divisible by the width, and be a multiple of the word (e.g., 32-bit) size), or you might use a loop in a shell script to use the -j <n> (have od skip the first n bytes of the file) combined with the -N option.