There is an array in which there are many million words. And you need to create an associative array with the wrong variants of all these words passing the correct version of that word as the key. And the wrong variant of the word must not coincide with the correct words in the array. And still all the wrong variants of words, too, should not coincide with each other. All these generation of incorrect variants of words I need to correct incorrect Cyrillic words (not Russian words and not English). As an example, take the words "apple" and "lost". Array with correct words for creating incorrect variants this words:
<?php
$correct_words = array(
"apple",
"lost",
"lot",
"microsoft"
);
?>
I want the result to be so:
<?php
$incorrect_variant_words = array(
"aple"=>"apple",
"lst"=>"lost",
"lt"=>"lot",
"microsot"=>"microsoft",
"microsft"=>"microsoft",
"microoft"=>"microsoft",
"micrsoft"=>"microsoft",
"micosoft"=>"microsoft",
"mirosoft"=>"microsoft",
"mcrosoft"=>"microsoft"
);
?>
I want to correct the incorrect words. Give advice or there is a solution for this task, please tell me. As for example in Google translator such function is implemented. How to get around this problem without the php extension of Pspell. Please help me to solve such a difficult task. To use as a correct word I also add an array of words with correct values.
<?php
$array = array(
"миёнаҳои",
"луғатҳои",
"онандроҷ",
"ганҷинаи",
"ҷамъиятӣ",
"иҷтимоии",
"муҳаммад",
"рӯзмарра",
"ҳамзабон",
"забонҳои",
"ҳамчунин",
"фарҳанге",
"феҳристи",
"зардуштӣ",
"таркибҳо",
"ибораҳои",
"калимаҳо",
"фарҳанги",
"тобишҳои",
"намунаҳо",
"нусхаҳои",
"фирдавсӣ",
"ҳуруфоти",
"мутобиқи",
"тақрибан",
"алоҳидаи",
"тоисломӣ",
"паҳлавик",
"классикӣ",
"мӯътабар",
"қадамҳои",
"баргаҳои"
);
?>
Thank you in advance
Use similar_text
to iterate over the array of correct words and compare them to the input value. Return the word with the highest match percentage. Basic concept:
$correct_words = array(
"apple",
"lost",
"lot",
"microsoft"
);
$input = 'lst';
$match = 0;
foreach ($correct_words as $correct) {
similar_text($correct, $input, $percent);
if ($percent > $match) {
$result = $correct;
$match = $percent;
}
}
echo $result;
Output is lost
Edit to add result of your query
$correct_words = array(
"тоҷик",
"тоҷикӣ",
"тоҷики"
);
$input = array("тоҷикӣ", "тоҷики", "точик", "точикӣ", "точики", "тоики", "тоикӣ", "тоҷӣкӣ", "тҷикӣ", "тчики", "тҷӣкӣ", "тчик");
foreach ($input as $in) {
$match = 0;
foreach ($correct_words as $correct) {
similar_text($correct, $in, $percent);
if ($percent > $match) {
$result = $correct;
$match = $percent;
}
}
echo "$in is corrected to $result\r\n";
}
Result is:
тоҷикӣ is corrected to тоҷикӣ
тоҷики is corrected to тоҷики
точик is corrected to тоҷик
точикӣ is corrected to тоҷикӣ
точики is corrected to тоҷики
тоики is corrected to тоҷики
тоикӣ is corrected to тоҷикӣ
тоҷӣкӣ is corrected to тоҷикӣ
тҷикӣ is corrected to тоҷикӣ
тчики is corrected to тоҷики
тҷӣкӣ is corrected to тоҷикӣ
тчик is corrected to тоҷик