bashgnu-sort

Does the order of GNU sort command arguments matter?


I have a file with the contents:

manufacturer,model,year,mileage,price
plym fury 1970 73 2500
chevy malibu 1999 60 3000
ford mustang 1965 45 10000
volvo s80 1998 102 9850
ford thundbd 2003 15 10500
chevy malibu 2000 50 3500
bmw 315i 1985 115 450
honda accord 2001 30 6000
ford taurus 2004 10 17000
toyota rav4 2002 180 750
chevy impala 1985 85 1550
ford explor 2003 25 9500

I'm asked to sort the file by the name of the manufacturer first and then by the price in reverse. I've implemented the same by the command:

(head -n 1 a) && (tail +2 a | sort -k1,1 -k5nr)

And gives me the correct output:

manufacturer,model,year,mileage,price
bmw 315i 1985 115 450
chevy malibu 2000 50 3500
chevy malibu 1999 60 3000
chevy impala 1985 85 1550
ford taurus 2004 10 17000
ford thundbd 2003 15 10500
ford mustang 1965 45 10000
ford explor 2003 25 9500
honda accord 2001 30 6000
plym fury 1970 73 2500
toyota rav4 2002 180 750
volvo s80 1998 102 9850

However if I modify sort -k1,1 -k5nr to sort -k1,1 -nrk5, I get

manufacturer,model,year,mileage,price
ford taurus 2004 10 17000
ford thundbd 2003 15 10500
ford mustang 1965 45 10000
volvo s80 1998 102 9850
ford explor 2003 25 9500
honda accord 2001 30 6000
chevy malibu 2000 50 3500
chevy malibu 1999 60 3000
plym fury 1970 73 2500
chevy impala 1985 85 1550
toyota rav4 2002 180 750
bmw 315i 1985 115 450

How does one subtle difference change the entire output and what difference does it make?

I tried referring to various sources but none of them help.


Solution

  • Let's first examine the relevant paragraphs from the man pages,

    -k, --key=KEYDEF
                  sort via a key; KEYDEF gives location and type
           KEYDEF  is  F[.C][OPTS][,F[.C][OPTS]]  for start and stop position, where F is a field number and C a character position in the field; both are origin 1, and the stop position defaults to the line's
           end.  If neither -t nor -b is in effect, characters in a field are counted from the beginning of the preceding whitespace.  OPTS is one or more single-letter ordering  options  [bdfgiMhnRrV],  which
           override global ordering options for that key.  If no key is given, use the entire line as the key.  Use --debug to diagnose incorrect key usage.
    

    When the option is specified as -nrk5, the -n is a global option, meaning it applies to all the fields, including your first key -k1,1. Since the field contains nothing that can be interpreted as numerical values, they are all considered equal, and you are effectively just sorting the 5th field in reverse numerical order.

    When the option is specified as -k5nr, the n is part of the KEYDEF and applies only to the -k5, so the -k1,1 is not affected and the first field is sorted alphabetically, giving you the expected result.

    You can see more clearly by following what the man page suggests and using the --debug option. In the -nrk5 case, you'll get the following error messages,

    sort: key 2 is numeric and spans multiple fields
    ford taurus 2004 10 17000
    ^ no match for key
                        _____
    _________________________
    

    which, while not very clear, helps you diagnose what is wrong with the options. Note that if you use -k5nr, you still get the message that -k5 spans multiple fields (i.e. the first line above would still be there), but the "no match for key" message would be replaced by an underline of the first field which indicates it's actually used in the sort.

    So, the answer to the title question is, yes, the order does matter.