I have a utility script in Python:
#!/usr/bin/env python
import sys
unique_lines = []
duplicate_lines = []
for line in sys.stdin:
if line in unique_lines:
duplicate_lines.append(line)
else:
unique_lines.append(line)
sys.stdout.write(line)
# optionally do something with duplicate_lines
This simple functionality (uniq
without needing to sort first, stable ordering) must be available as a simple UNIX utility, mustn't it? Maybe a combination of filters in a pipe?
Reason for asking: needing this functionality on a system on which I cannot execute Python from anywhere.
The UNIX Bash Scripting blog suggests:
awk '!x[$0]++'
This command is telling awk which lines to print. The variable $0
holds the entire contents of a line and square brackets are array access. So, for each line of the file, the node of the array x
is incremented and the line printed if the content of that node was not (!
) previously set.