I have a file with three columns. I would like to extract rows with top two values in column 3 for each unique value in column 2.
cat file.list
run1/xx2/x2c1.txt 21 -190
run1/xx2/x2c2.txt 19 -180
run1/xx2/x2c3.txt 18 -179
run1/xx2/x2c4.txt 19 -162
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162
run2/xx2/x2c2.txt 18 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run2/xx2/x2c5.txt 21 -179
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c2.txt 19 -192
run3/xx2/x2c3.txt 21 -191
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c5.txt 19 -179
expected output
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190
I feel like some combination of sort, uniq and awk might accomplish but I can't properly execute it. I can sort by columns
sort -nk2 -nk3 file.list
which gives me an output sorted by -k2 and -k3 as follows,
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run1/xx2/x2c3.txt 18 -179
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run1/xx2/x2c2.txt 19 -180
run3/xx2/x2c5.txt 19 -179
run1/xx2/x2c4.txt 19 -162
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190
run2/xx2/x2c5.txt 21 -179
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162
but then I get stuck on how to extract only the rows with best two scores in the last column for 18, 19 and 20.
I would really appreciate any bash solutions.
Piping the current sort
results to awk
:
$ sort -nk2 -nk3 file.list | awk 'a[$2]++ < 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190
Where:
$2
) is used as the index for array a[]
++
)a[18]
the count is 0, we print the line, and increment the count by 1a[18]
the count is 1, we print the line, and increment the count by 1a[18]
the count is greater than or equal to 2, we do not print the line, and increment the countAn alternative where we increment the count first:
$ sort -nk2 -nk3 file.list | awk '++a[$2] <= 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190