[SOLVED] Delete consecutive duplicate lines using unix utilities

Delete consecutive duplicate lines using unix utilities

This sounds simple on its face but is actually somewhat more complex. I would like to use a unix utility to delete consecutive duplicates, leaving the original. But, I would also like to preserve other duplicates that do not occur immediately after the original. For example, if we have the lines:

O B 
O B 
C D 
T V
O B

I want the output to be:

O B 
C D
T V
O B

Although the first and last lines are the same, they are not consecutive and therefore I want to keep them as unique entries.

Solution

You can do:

cat file1 | uniq > file2

or more succinctly:

uniq file1 file2

assuming file1 contains

O B
O B
C D
T V
O B

For more details, see man uniq. In particular, note that the uniq command accepts two arguments with the following syntax: uniq [OPTION]... [INPUT [OUTPUT]].

Finally if you'd want to remove all duplicates (and sort the file along the way), you could do:

sort -u file1 > file2