I'm receiving heaps of Data from a website and all of those String values need to be added in our database.
During the Inseration into the Database SQL throws sometimes following Error:
Warning: PDOStatement::execute(): SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE)
The Database Tables are actually setted up to use Latin1.
After encoding my values with json_encode() I ve found out what occurs this Error. The UTF sequences, which are representing some specialcharackters, inbetween the strings need to be converted into their actual value:
encoded string: candidate\u00e2\u0080\u0099s individual circumstances
the sequence \u00e2\u0080\u0099 represents an ' in this example.
Whatever there are only a few different sequences and i also know the values i want/need to replace them with, but i am struggeling with the conversion.
I ve tried several aproaches but none of them worked out,
using str_replace:
str_replace('\\u00e2\\u0080\\u0099', '\'', ($string));
Didnt change anything in the string
using the mb_functions:
$encodedStr = mb_convert_encoding($string, 'ASCII')
left me with some misterious ?? instead of the UTF sequences but it doesnt throw the database error but it is still not what i need.
using preg_replace:
preg_replace('/\\u00e2\\u0080\\u0099/', '\'', $string)
threw an Error: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1
I ve tried several more options but those where the three ones which came up into my mind when i started forcing this problem and i just cant figure out why those functions especially str_replace doesnt work in the expected way.
I finally solved the problem. Just incase somebody struggles with the same issue. The solution which worked for me was posted in
I have a string with "\u00a0", and I need to replace it with "" str_replace fails
private function convert($string) {
/* Strings to remove:
* \u00a0 =
* \u00e2\u0080\u0099 = '
*
*/
$string = str_replace(chr(194).chr(160), '', $string); //removes \u00a0
$string = str_replace('â', '', $string); //removes \u00e2
$string = str_replace(chr(194).chr(128).chr(194).chr(153), '\'', $string); //removes \u0080\u0099
return $string;
}