phpcsvsplfileobject

How to support non-standard characters in PHP SplFileObject reading a CSV


I have a short script that reads a CSV file which looks like the following:

$csv = new SplFileObject($pathToFile, 'r');

while (!$csv->eof() && ($row = $csv->fgetcsv()) && $row[0] !== null) {
    var_dump($row);
}

This works ok, except it has a problem with some non-standard characters. There are some German-language words in the CSV, and my specific problem is that it has difficulties with umlauts. An example of the type of row it outputs is:

array(5) {
    [0]=>
        string(6) "J¦rgen"
    [1]=>
        string(8) "Lastname"
    [2]=>
        string(14) "name@domain.de"
    [3]=>
        string(7) "Example"
    [4]=>
        string(7) "Example"
}

The ü in Jürgen getting replaced with a ¦ character.

I've tried putting the following code before:

mb_internal_encoding('UTF-8');

but it has had no effect.

Opening the csv file in Vi shows the ü successfully, so the file is correct on the server.

Can anyone advise how to PHP successfully handling German characters when parsing a CSV?


Solution

  • The code itself as shown should work. I guess the problem is caused by character encoding of the CSV file, which seems not utf-8. You need to find out what is the encoding of your input file.

    Once you found that out, you can convert the file to utf-8 using the iconv command. (In comments you told that the input encoding was iso-8859-1).

    Example:

    iconv -f 'iso-8859-1' -t 'utf-8' input.csv > utf8.csv
    

    Attention! please never attempt to override the file directly like this:

    iconv -f 'iso-8859-1' -t 'utf-8' data.csv > data.csv
    

    This would overwrite (truncate) data.csv and lead to complete data loss. This is because the shell creates and truncates the output file before executing the command itself.