phpcountingcpu-wordtextmatching

Count whole word matches in a string using multiple search words


I have a paragraph that I have to parse for different keywords. For example:

"I want to make a change in the world. Want to make it a better spot to live. Peace, Love and Harmony. It is all life is all about. We can make our world a good place to live"

And my keywords are:

"world", "earth", "place"

I should report whenever I have a match and how many times.

Output should be like:

"world" 2 times and "place" 1 time

earth isn't mentioned because it was never matched.

Currently, I am just converting my paragraph strings to an array of characters and then matching each keyword with all of the array contents - which is wasting my resources. Please guide me for an efficient way.


Solution

  • As @CasimiretHippolyte commented, regex is the better means as word boundaries can be used. Further caseless matching is possible using the i flag. Use with preg_match_all return value:

    Returns the number of full pattern matches (which might be zero), or FALSE if an error occurred.

    The pattern for matching one word is: /\bword\b/i. Generate an array where the keys are the word values from search $words and values are the mapped word-count, that preg_match_all returns:

    $words = array("earth", "world", "place", "foo");
    
    $str = "at Earth Hour the world-lights go out and make every place on the world dark";
    
    $res = array_combine($words, array_map( function($w) USE (&$str) { return
           preg_match_all('/\b'.preg_quote($w,'/').'\b/i', $str); }, $words));
    

    print_r($res); test at eval.in outputs to:

    Array ( [earth] => 1 [world] => 2 [place] => 1 [foo] => 0 )

    Used preg_quote for escaping the words which is not necessary, if you know, they don't contain any specials. For the use of inline anonymous functions with array_combine PHP 5.3 is required.