unixsorting

Unix sort utility: use hexadecimal byte value as delimiter


I'm wondering if I can use a hexadecimal value as delimiter of the Unix sort utility. Basically I want to do something like:

sort -t '\x00' <input

But it doesn't work if I do it in the way above.


Solution

  • If you read the GNU sort manual, you will find:

    -t separator, --field-separator=separator

    Use character separator as the field separator when finding the sort keys in each line. By default, fields are separated by the empty string between a non-blank character and a blank character. By default a blank is a space or a tab, but the LC_CTYPE locale can change this. That is, given the input line foo bar, sort breaks it into fields foo and bar. The field separator is not considered to be part of either the field preceding or the field following, so with sort -t " " the same input line has three fields: an empty field, ‘foo’, and ‘bar’. However, fields that extend to the end of the line, as -k 2, or fields consisting of a range, as -k 2,3, retain the field separators present between the endpoints of the range. To specify ASCII nul as the field separator, use the two-character string \0, e.g., sort -t ’\0’.

    This worked with old (GNU CoreUtils 5.97) sort.


    There does not seem to be a way to do it on Linux. I've tried a number of tricks to get a NUL (0x00) byte into the delimiter, and the sort command complains:

    sort: empty tab
    

    You can't do it with Control-V @ as you are typing the command line; the shell (bash) does not like that.

    I have a program genchar that writes bytes to output, so I tried:

    sort -t "$(genchar 0)" ...
    

    And that did not work either; I got the error from sort.

    $ genchar 0 | od -c
    0000000  \0  \n
    0000002
    $
    

    If you were able to use control-A instead, then there'd be no problem.

    Note that sort does not expand hex escape sequences in the '-t' option argument; you have to supply the actual byte you want to use. You probably can't use newline as a field delimiter, either; if you did, what would the record delimiter be?

    GNU 'sort' (from CoreUtils 5.97, at any rate; the current version is 8.12 - as of 2011-04-26) does support a -z option:

    This is not, sadly, what you are looking for.