phpstringpreg-replacestr-replacemb-convert-encoding

PHP conversion from utf8_general_ci to latin1_swedish_ci


I'm receiving heaps of Data from a website and all of those String values need to be added in our database.

During the Inseration into the Database SQL throws sometimes following Error:

Warning:  PDOStatement::execute(): SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE)

The Database Tables are actually setted up to use Latin1.

After encoding my values with json_encode() I ve found out what occurs this Error. The UTF sequences, which are representing some specialcharackters, inbetween the strings need to be converted into their actual value:

encoded string: candidate\u00e2\u0080\u0099s individual circumstances

the sequence \u00e2\u0080\u0099 represents an ' in this example.

Whatever there are only a few different sequences and i also know the values i want/need to replace them with, but i am struggeling with the conversion.

I ve tried several aproaches but none of them worked out,

using str_replace:

str_replace('\\u00e2\\u0080\\u0099', '\'', ($string));

Didnt change anything in the string

using the mb_functions:

$encodedStr = mb_convert_encoding($string, 'ASCII')

left me with some misterious ?? instead of the UTF sequences but it doesnt throw the database error but it is still not what i need.

using preg_replace:

preg_replace('/\\u00e2\\u0080\\u0099/', '\'', $string)


threw an Error: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1

I ve tried several more options but those where the three ones which came up into my mind when i started forcing this problem and i just cant figure out why those functions especially str_replace doesnt work in the expected way.


Solution

  • I finally solved the problem. Just incase somebody struggles with the same issue. The solution which worked for me was posted in

    I have a string with "\u00a0", and I need to replace it with "" str_replace fails

    private function convert($string) {
        /* Strings to remove:    
         *      \u00a0 = 
         *      \u00e2\u0080\u0099 = '
         *      
         */
        $string = str_replace(chr(194).chr(160), '', $string);  //removes \u00a0
        $string = str_replace('â', '', $string);  //removes \u00e2
        $string = str_replace(chr(194).chr(128).chr(194).chr(153), '\'', $string);  //removes \u0080\u0099
    
        return $string;
        }