I'm trying to sort phrases such as the following:
a12_b7
a12_b11
a5_b3
a5_b30
a12_b10
using the numbers following the letters, lexicographically. For the example above, I expect the result to be:
a5_b3
a5_b10
a12_b7
a12_b11
a12_b30
Reading man sort, I thought I had this figured out:
But - that does not work like I thought it would:
$ cat | sort --debug --key=1.2 --key=2.2 --field-separator=_
sort: text ordering performed using ‘en_IL.UTF-8’ sorting rules
a12_b10
______
__
_______
a12_b11
______
__
_______
a12_b7
_____
_
______
a5_b3
____
_
_____
a5_b30
_____
__
______
What have I gotten wrong? And what would be the appropriate sort command-line in this case?
Looks like you want to sort numerically when the default sort order is alphabetically. You could do:
$ sort -nt'_' -k1.2 -k2.2 file
a5_b3
a5_b30
a12_b7
a12_b10
a12_b11
but if the input was any more complicated than that (e.g. not always single letter chars before each sort key) then I'd use the Decorate-Sort-Undecorate idiom, e.g.:
$ cat file
phc12_bob7
efg12_bk11
cfad5_xxxx3
df5_chekb30
a12_tg10
$ sed -E 's/([^0-9]+)(.*)(_[^0-9]+)(.*)/\1\t\2\t\3\t\4/' file
phc 12 _bob 7
efg 12 _bk 11
cfad 5 _xxxx 3
df 5 _chekb 30
a 12 _tg 10
$ sed -E 's/([^0-9]+)(.*)(_[^0-9]+)(.*)/\1\t\2\t\3\t\4/' file | sort -k2,2n -k4,4n
cfad 5 _xxxx 3
df 5 _chekb 30
phc 12 _bob 7
a 12 _tg 10
efg 12 _bk 11
$ sed -E 's/([^0-9]+)(.*)(_[^0-9]+)(.*)/\1\t\2\t\3\t\4/' file | sort -k2,2n -k4,4n | tr -d '\t'
cfad5_xxxx3
df5_chekb30
phc12_bob7
a12_tg10
efg12_bk11
The sed modifies (Decorates) the input so the sort command can Sort it, then the tr removes the extra chars that the sed added (Undecorates).