I have a code that compares the output with the values of the array, and only terminates the operation with words in the array:
First code(just a example)
$myVar = 'essa pizza é muito gostosa, que prato de bom sabor';
$myWords=array(
array('sabor','gosto','delicia'),
array('saborosa','gostosa','deliciosa'),
);
foreach($myWords as $words){
shuffle($words); // randomize the subarray
// pipe-together the words and return just one match
if(preg_match('/\K\b(?:'.implode('|',$words).')\b/',$myVar,$out)){
// generate "replace_pair" from matched word and a random remaining subarray word
// replace and preserve the new sentence
$myVar=strtr($myVar,[$out[0]=>current(array_diff($words,$out))]);
}
}
echo $myVar;
My Question:
I have a second code, which is not for rand/shuffle(I do not want rand, I want precision in substitutions, I always change column 0 through 1), is to always exchange the values:
// wrong output: $myVar = "minha irmã alanné é not aquela blnode, elere é a bom plperito";
$myVar = "my sister alannis is not that blonde, here is a good place";
$myWords=array(array("is","é"),
array("on","no"),
array("that","aquela"),
//array("blonde","loira"),
//array("not","não"),
array("sister","irmã"),
array("my","minha"),
//array("nothing","nada"),
array("myth","mito"),
array("he","ele"),
array("good","bom"),
array("ace","perito"),
// array("here","aqui"), //if [here] it does not exist, it is not to do replacement from the line he=ele = "elere" non-existent word
);
$replacements = array_combine(array_column($myWords,0),array_column($myWords,1));
$myVar = strtr($myVar,$replacements);
echo $myVar;
// expected output: minha irmã alannis é not aquela blonde, here é a bom place
// avoid replace words slice!
expected output: minha irmã alannis é not aquela blonde, here é a bom place
// avoid replace words slice! always check if the word exists in the array before making the substitution.
alanné, blnode, elere, plperito
it examines whether the output will be of real words, which exist in the array myWords, this avoids typing errors like:
that 4 words is not an existent words, a writing error. how do you do that for the second code?
in short, the exchange must be made by a complete word / key, an existing word. and not create something strange using slices of keywords!
My previous method was incredibly inefficient. I didn't realize how much data you were processing, but if we are upwards of 4000 lines, then efficiency is vital (I think I my brain was stuck thinking about strtr()
related processing based on your previous question(s)). This is my new/improved solution which I expect to leave my previous solution in the dust.
Code: (Demo)
$myVar = "My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!";
echo "$myVar\n";
$myWords = [
["is", "é"],
["on", "no"],
["that", "aquela"],
["sister", "irmã"],
["my", "minha"],
["myth", "mito"],
["he", "ele"],
["good", "bom"],
["ace", "perito"],
["i", "eu"] // notice I must be lowercase
];
$translations = array_column($myWords, 1, 0); // or skip this step and just declare $myWords as key-value pairs
// length sorting is not necessary
// preg_quote() and \Q\E are not used because dealing with words only (no danger of misinterpretation by regex)
$pattern = '/\b(?>' . implode('|', array_keys($translations)) . ')\b/i'; // atomic group is slightly faster (no backtracking)
/* echo $pattern;
makes: /\b(?>is|on|that|sister|my|myth|he|good|ace)\b/i
demo: https://regex101.com/r/DXTtDf/1
*/
$translated = preg_replace_callback(
$pattern,
function($m) use($translations) { // bring $translations (lookup) array to function
$encoding = 'UTF-8'; // default setting
$key = mb_strtolower($m[0], $encoding); // standardize keys' case for lookup accessibility
if (ctype_lower($m[0])) { // treat as all lower
return $translations[$m[0]];
} elseif (mb_strlen($m[0], $encoding) > 1 && ctype_upper($m[0])) { // treat as all uppercase
return mb_strtoupper($translations[$key], $encoding);
} else { // treat as only first character uppercase
return mb_strtoupper(mb_substr($translations[$key], 0, 1, $encoding), $encoding) // uppercase first
. mb_substr($translations[$key], 1, mb_strlen($translations[$key], $encoding) - 1, $encoding); // append remaining lowercase
}
},
$myVar
);
echo $translated;
Output:
My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!
Minha irmã alannis É not Aquela blonde, here é a bom place. Eu know Ariane é not MINHA IRMÃ!
This method:
$myVar
, not 1 pass for every subarray of $myWords
.$myWords
/$translations
).preg_quote()
) or making pattern components literal (\Q..\E
) because only words are being translated.$encoding
value for stability / maintainability / re-usability.