bashsortingawkextractuniq

Sort according to two columns and extract top two based on last column


I have a file with three columns. I would like to extract rows with top two values in column 3 for each unique value in column 2.

cat file.list
run1/xx2/x2c1.txt 21 -190
run1/xx2/x2c2.txt 19 -180
run1/xx2/x2c3.txt 18 -179
run1/xx2/x2c4.txt 19 -162
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162
run2/xx2/x2c2.txt 18 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run2/xx2/x2c5.txt 21 -179
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c2.txt 19 -192
run3/xx2/x2c3.txt 21 -191
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c5.txt 19 -179

expected output

run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

I feel like some combination of sort, uniq and awk might accomplish but I can't properly execute it. I can sort by columns

sort -nk2 -nk3 file.list 

which gives me an output sorted by -k2 and -k3 as follows,

run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run1/xx2/x2c3.txt 18 -179
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run1/xx2/x2c2.txt 19 -180
run3/xx2/x2c5.txt 19 -179
run1/xx2/x2c4.txt 19 -162
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190
run2/xx2/x2c5.txt 21 -179
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162

but then I get stuck on how to extract only the rows with best two scores in the last column for 18, 19 and 20.

I would really appreciate any bash solutions.


Solution

  • Piping the current sort results to awk:

    $ sort -nk2 -nk3 file.list | awk 'a[$2]++ < 2'
    run2/xx2/x2c2.txt 18 -192
    run3/xx2/x2c4.txt 18 -184
    run3/xx2/x2c2.txt 19 -192
    run2/xx2/x2c3.txt 19 -191
    run3/xx2/x2c3.txt 21 -191
    run1/xx2/x2c1.txt 21 -190
    

    Where:

    An alternative where we increment the count first:

    $ sort -nk2 -nk3 file.list | awk '++a[$2] <= 2'
    run2/xx2/x2c2.txt 18 -192
    run3/xx2/x2c4.txt 18 -184
    run3/xx2/x2c2.txt 19 -192
    run2/xx2/x2c3.txt 19 -191
    run3/xx2/x2c3.txt 21 -191
    run1/xx2/x2c1.txt 21 -190