sortingcommanduniq

uniq command - how to get delimiter option and search on the basis of column?


Below is a sorted (on the basis of column one) tab-delimited file named file.txt

barbie  325 social activist
david   214 IT professional
david   457 mathematician
david   458 biologist
john    85  engineer
john    98  doctor
peter   100 statistician

I want to run the uniq command on the basis of column one using options (-t and -k in case of the sort command).

uniq -d (-t$'\t' -k1,1) file.txt  # this is incorrect syntax in brackets, but I want to run it in similar way

This should be quite easy but I am unable to find my way.

What can I do to get output as:

david   214 IT professional
john    85  engineer

Solution

  • Debian uniq used to have this option, but it was removed for compatibility reasons. You can create your own AWK or Perl script easily. This prints only the lines with the first occurrence of the first field:

    awk -F '\t' '!x[$1]++' file.txt
    

    x[$1] is an associative array on the contents of the first field ($1); it gets incremented for each line, but it is also the as the condition which specifies whether or not the current line should be printed; with the negation, it is true only if this field value has not been encountered before. (Reminder: the general form of an AWK script is zero or more of condition { action } and both parts are optional; if {action} is missing, the default action is to print the current line. [If the condition is missing, the action is taken unconditionally.])