I have a short script that reads a CSV file which looks like the following:
$csv = new SplFileObject($pathToFile, 'r');
while (!$csv->eof() && ($row = $csv->fgetcsv()) && $row[0] !== null) {
var_dump($row);
}
This works ok, except it has a problem with some non-standard characters. There are some German-language words in the CSV, and my specific problem is that it has difficulties with umlauts. An example of the type of row it outputs is:
array(5) {
[0]=>
string(6) "J¦rgen"
[1]=>
string(8) "Lastname"
[2]=>
string(14) "name@domain.de"
[3]=>
string(7) "Example"
[4]=>
string(7) "Example"
}
The ü in Jürgen getting replaced with a ¦ character.
I've tried putting the following code before:
mb_internal_encoding('UTF-8');
but it has had no effect.
Opening the csv file in Vi shows the ü successfully, so the file is correct on the server.
Can anyone advise how to PHP successfully handling German characters when parsing a CSV?
The code itself as shown should work. I guess the problem is caused by character encoding of the CSV file, which seems not utf-8
. You need to find out what is the encoding of your input file.
Once you found that out, you can convert the file to utf-8
using the iconv
command. (In comments you told that the input encoding was iso-8859-1
).
Example:
iconv -f 'iso-8859-1' -t 'utf-8' input.csv > utf8.csv
Attention! please never attempt to override the file directly like this:
iconv -f 'iso-8859-1' -t 'utf-8' data.csv > data.csv
This would overwrite (truncate) data.csv and lead to complete data loss. This is because the shell creates and truncates the output file before executing the command itself.