need command to find the count of unique words in a file using grep
Tried using grep along with uniq and sort but need to find a way to use only grep and wc commands.these are the two ways in which am able to do but i need to do using only grep..
$ grep -oE '\w+' 'file.txt' | sort | uniq | wc -l
$ grep -oE '\w+' 'file.txt' > temp.txt && awk '!seen[$0]++' temp.txt | wc -l
Sample input file:
one two three four five
two four one six
eight three seven five
Output: unique word count: 8
Is it possible to first extract the words using the grep -oE '\w+' file.txt command then perform grep on each word to an empty file and append the word to the file if grep does not find the word to exist in that file.this way only those words which are not found in the new file will get appended to it? is it possible to do this using grep ?
Since your grep
has -o
I shall assume it also has -P
and -z
:
grep -zPo '(?s)(\b\w+\b)(?!.*\b\1\b)' file.txt |
grep -zc ^
-z
to make grep
treat the entire file as a single "line" (since there should be no nulls in it)-P
to enable Perl-compatible regular expressions (PCRE) which allow lookaround assertions(?s)
- tell PCRE that .
should also match newlines(?!
... )
to find the final occurrence of each word (i.e. word not followed by anything followed by itself)
\b\w+\b
and \b\1\b
exclude partial words-o
to output each match on its own "line" (because of -z
, nulls are used as the line ending character)This will be very slow on larger files.