language-agnostic

How do I count the number of occurrences of a string in an entire file?


Is there an inbuilt command to do this or has anyone had any luck with a script that does it?

I am looking to count the number of times a certain string (not word) appears in a file. This can include multiple occurrences per line so the count should count every occurrence not just count 1 for lines that have the string 2 or more times.

For example, with this sample file:

blah(*)wasp( *)jkdjs(*)kdfks(l*)ffks(dl
flksj(*)gjkd(*
)jfhk(*)fj (*) ks)(*gfjk(*)

If I am looking to count the occurrences of the string (*) I would expect the count to be 6, i.e. 2 from the first line, 1 from the second line and 3 from the third line. Note how the one across lines 2-3 does not count because there is a LF character separating them.

Update: great responses so far! Can I ask that the script handle the conversion of (*) to \(*\), etc? That way I could just pass any desired string as an input parameter without worrying about what conversion needs to be done to it so it appears in the correct format.


Solution

  • Using perl's "Eskimo kiss" operator with the -n switch to print a total at the end. Use \Q...\E to ignore any meta characters.

    perl -lnwe '$a+=()=/\Q(*)/g; }{ print $a;' file.txt
    

    Script:

    use strict;
    use warnings;
    
    my $count;
    my $text = shift;
    
    while (<>) {
        $count += () = /\Q$text/g;
    }
    
    print "$count\n";
    

    Usage:

    perl script.pl "(*)" file.txt