shellawkgrepuniqwc

Get the count of unique words in a file using grep and wc


need command to find the count of unique words in a file using grep

Tried using grep along with uniq and sort but need to find a way to use only grep and wc commands.these are the two ways in which am able to do but i need to do using only grep..

$ grep -oE '\w+' 'file.txt' | sort | uniq | wc -l
$ grep -oE '\w+' 'file.txt' > temp.txt && awk '!seen[$0]++' temp.txt | wc -l

Sample input file:

one two three four five
two four one six
eight three seven five

Output: unique word count: 8

Is it possible to first extract the words using the grep -oE '\w+' file.txt command then perform grep on each word to an empty file and append the word to the file if grep does not find the word to exist in that file.this way only those words which are not found in the new file will get appended to it? is it possible to do this using grep ?


Solution

  • Since your grep has -o I shall assume it also has -P and -z:

    grep -zPo '(?s)(\b\w+\b)(?!.*\b\1\b)' file.txt |
    grep -zc ^
    

    This will be very slow on larger files.