phpdictionarywordspspell

Extract valid words from a text file PHP


I have created a PHP code that extracts valid words form a text file :

$pspell_link = pspell_new("en");
$handle = fopen("list.txt", "r");

if ($handle) {
            while (($line = fgets($handle)) !== false) {

                    $line = str_replace(' ', '', $line);
                    $line = preg_replace('/\s+/', '', $line);

                    if (pspell_check($pspell_link, $line)) 
                    {
                        echo $line."<br>";
                    }

            }
}

Let's assume that the list.txt contains

ghgh fghy Hello Hellothere

The code above will print only : Hello

What I'm trying to do is to print Hellothere as well as it contains two valid words Hello and there


Solution

  • (Edited)

    You can try to pass constant PSPELL_RUN_TOGETHER as option:

    $pspell_link = pspell_new( "en", Null, Null, Null, PSPELL_RUN_TOGETHER );
    

    From PHP documentation:

    The mode parameter is the mode in which spellchecker will work. There are several modes available:

    PSPELL_FAST - Fast mode (least number of suggestions)

    PSPELL_NORMAL - Normal mode (more suggestions)

    PSPELL_BAD_SPELLERS - Slow mode (a lot of suggestions)

    PSPELL_RUN_TOGETHER - Consider run-together words as legal compounds. That is, "thecat" will be a legal compound, although there should be a space between the two words. Changing this setting only affects the results returned by pspell_check(); pspell_suggest() will still return suggestions.

    Furthermore, replacing all spaces in the line, you pass a string like "ghghfghyHelloHellothere" to pspell_check()

    Try instead exploding:

    (...)
    $words = explode( ' ', $line );
    foreach($words as $word)
    {
        if (pspell_check($pspell_link, $word)) 
        {
            echo "---> ".$word.PHP_EOL;
        }
    }
    (...)