linuxbashtext-processing

Find the most common line in a file in bash


I have a file of strings:

string-string-123
string-string-123
string-string-123
string-string-12345
string-string-12345
string-string-12345-123

How do I retrieve the most common line in bash (string-string-123)?


Solution

  • You could use awk to do this:

    awk '{++a[$0]}END{for(i in a)if(a[i]>max){max=a[i];k=i}print k}' file
    

    The array a keeps a count of each line. Once the file has been read, we loop through it and find the line with the maximum count.

    Alternatively, you can skip the loop in the END block by assigning the line during the processing of the file:

    awk 'max < ++c[$0] {max = c[$0]; line = $0} END {print line}' file
    

    Thanks to glenn jackman for this useful suggestion.


    It has rightly been pointed out that the two approaches above will only print out one of the most frequently occurring lines in the case of a tie. The following version will print out all of the most frequently occurring lines:

    awk 'max<++c[$0] {max=c[$0]} END {for(i in c)if(c[i]==max)print i}' file