bashunixduplicatesutility

Delete consecutive duplicate lines using unix utilities


This sounds simple on its face but is actually somewhat more complex. I would like to use a unix utility to delete consecutive duplicates, leaving the original. But, I would also like to preserve other duplicates that do not occur immediately after the original. For example, if we have the lines:

O B 
O B 
C D 
T V
O B

I want the output to be:

O B 
C D
T V
O B 

Although the first and last lines are the same, they are not consecutive and therefore I want to keep them as unique entries.


Solution

  • You can do:

    cat file1 | uniq > file2
    

    or more succinctly:

    uniq file1 file2
    

    assuming file1 contains

    O B
    O B
    C D
    T V
    O B
    

    For more details, see man uniq. In particular, note that the uniq command accepts two arguments with the following syntax: uniq [OPTION]... [INPUT [OUTPUT]].

    Finally if you'd want to remove all duplicates (and sort the file along the way), you could do:

    sort -u file1 > file2