linuxsortinguniq

how to use uniq command on the first two field


here's the full pipeline im developing so far:

cut -d ";" -f 1,2,3,5 merge_of_raw_data.csv | sort -t";" -k4 -r | cut -d ";" -f 1-3 | uniq

so the last cut command will give something like that:

1 2 3
2 3 4
3 4 5
1 2 3
1 2 5
1 3 3

but then i would like to keep only the uniq lines based on the first two field

using:

sort -k1,1 -k2,2 --unique

doesn't solve what i want since i need to keep the first occurrence has its already sorted by date.

the expected output for this example would be:

1 2 3
2 3 4
3 4 5
1 3 3

Solution

  • Your input after the last cut is producing your expected output with:

    cat input.txt | sort -k1,1 -k2,2 --unique
    

    However, the order is different than your expected output:

    1 2 3
    1 3 3
    2 3 4
    3 4 5
    

    The output is sorted based on the k1, then k2 columns.