phphindi

how to count hindi string in array with php and count how many letter and vowel in string


I have something like

$a = "आलोक"

I want to achieve something like in php

a[0] = आ  
a[1] = लो  
a[3] = क 

I want counting in numbers like :- i put a name आलोक i want output like letter=3 and vowel=2 because in आलोक first letter (आ), second letter (लो) and third letter is (क). so out put become is letter= 3 and for vowel , first vowel ( ा) and second vowel( ो) so out put vowel=2

name can be dynamic not static


Solution

  • I was going through the other question you had posted and the accepted answer suggests a function on the following lines to break up the string into characters:

     function mbStringToArray ($string) {
      $strlen = mb_strlen($string);
       while ($strlen) {
        $array[] = mb_substr($string,0,1,"UTF-8");
        $string = mb_substr($string,1,$strlen,"UTF-8");
        $strlen = mb_strlen($string);
      }
      return $array;
     } 
      $a = "आलोक"; 
      print_r(mbStringToArray($a));
    

    If you run this code, it will give you the following output:

      Array
      (
       [0] => आ
       [1] => ल
       [2] => ो
       [3] => क
      )
    

    I'm going to build upon this function and just extend it a little bit and you'll be able to get the count of vowels and consonants easily.

    Thankfully, I found this handy guide on the UTF-8 encodings of all the characters in the Devnagri script. Another simple tool to confirm and sort of get the decimal and octal representations as well for this characters is http://unicodelookup.com/.

    From the table, I looked up 0x093F and easily cross referenced it with ि.

    Now once you have this, it's just a matter of getting the decoded unicode character from the HEX code. You can achieve that easily with :

    echo json_decode('"\u093F"'); //Ouputs  ि
    

    I have combined these steps together in a function called countVowels:

     function countVowels ($req){
    
       //I have hard coded the hex values of some characters that are vowels in Hindi
       //This does NOT include all the vowels
       //You might want to add more as per your needs from the table that I have provided before
    
       $hindi = array("\u0906","\u0908","\u093E","\u093F","\u0945","\u0946","\u0947","\u0948","\u0949","\u094A","\u094B","\u094C","\u094D");
       $vowels= array();
       $vowelcount = 0;
       for($i = 0; $i<count($hindi); $i++){
    
         //Push the decoded unicode character into the $vowels array
         array_push($vowels,json_decode('"'.$hindi[$i].'"')); 
       }
    
       for($j=0;$j<count($req);$j++){
          if(in_array($req[$j], $vowels))
            $vowelcount++;
       }
       return $vowelcount;
     }
    

    The input to this function is $req which could be the output array for the previously defined function mbStringToArray. Once you have the count of vowels, you can easily get the count of other consonants. The flow might look something like:

      $a = "आलोक"; 
      $arr = mbStringToArray($a)
      $vows = countVowels($arr); //Number of vowels 
      $cons = count($arr) - $vows; //Number of consonants
    

    So in this case, the consonants returned would be 2 and vowels would also be 2. That's because I have hardcoded आ as a vowel and therefore it gets accounted for in the countVowels function. Have a look at the working demo.

    You can modify the array I use there and take care of such discrepancies as per your requirements. I hope this gets you started in the right direction.