sortingubuntugnu-sort

-g option break conservative linux sort


hi I just ran into either a bug or more certainly an error from me. I am trying to sort a file that has 5 column by three specific column.

I am using -k option.

sort  -k1,1 -k3,3  -k4,4 < abundance_key_60.tsv

SO90    TARA_031_SRF    M00370  0.0004796352593680699   5380.716788521779
SO90    TARA_072_MES    M00370  6.704622779795495   889.5003464019538
WDU TARA_072_MES    M00165  0.00010342611234558623  1372.1512123790574
WDU TARA_046_SRF    M00165  0.00011353279569781544  582.9204804414709
WDU TARA_025_DCM    M00165  0.00028966684296873025  2486.7113286682593

Everything work fine then I realised one of my column is numeric and I add the -g option for this column. At this point sort seems to only filter by this column :

sort -k1,1 -k3,3  -gk4,4 <  test_.sort.txt 

SO90    TARA_031_SRF    M00370 0.0004796352593680699    5380.716788521779
WDU TARA_025_DCM    M00165  0.00028966684296873025  2486.7113286682593
WDU TARA_046_SRF    M00165  0.00011353279569781544  582.9204804414709
WDU TARA_072_MES    M00165  0.00010342611234558623  1372.1512123790574
SO90    TARA_072_MES    M00370  6.704622779795495   889.5003464019538

I try to use -s option but I did not change the results. any help appreciated!

ps: this is sample from my file that reproduce the bug.

I am on ubuntu 16.04 with default bash and sort for this distribution.


Solution

  • You want to specify the g only for -k4,4, like this:

    bash$ sort -k1,1 -k3,3 -k4,4g test_.sort.txt
    SO90    TARA_031_SRF    M00370 0.0004796352593680699    5380.716788521779
    SO90    TARA_072_MES    M00370  6.704622779795495   889.5003464019538
    WDU TARA_072_MES    M00165  0.00010342611234558623  1372.1512123790574
    WDU TARA_046_SRF    M00165  0.00011353279569781544  582.9204804414709
    WDU TARA_025_DCM    M00165  0.00028966684296873025  2486.7113286682593
    

    (Experimentally verified by changing the number to 6.704622779795495E-10 and observing how that changes the sort order. A better test case would contain samples which trivially reveal when you get the correct result.)