Our PHP
web application (PHP 5.6.30
running on Windows Server 2008 R2
) uses UTF-8
encoding but needs to import data from files that are encoded using Windows-1252
. When the data is imported it is converted to UTF-8
as follows.
iconv('Windows-1252', 'UTF-8', $value);
When we import the following sample data, the conversion works correctly for most of the Windows-1252
characters, but in line 8 below, the à
character gives problems and is not correctly converted.
1;€
2;é
3;è
4;ë
5;ï
6;ä
7;á
8;à
9;ç
10;ß
11;ø
12;í
13;ì
14;ñ
15;@
16;û
Here is a screenshot showing the result of displaying this data on the website.
Does anyone know why the PHP
iconv
is not correctly converting the à
character?
I resolved this issue and it ended up having nothing to do with iconv
like I initially thought. The change that was required was such a small one, only one character, but it took me ages to hunt this down. It turns out that the offending statement was actually the following:
preg_replace('/\s+/', ' ',$columnvalue))
The purpose of this regular expression is to remove white space from the value, but because the encoding was UTF-8
this regular expression
had a residual effect of corrupting the à
character. I resolved this but adding u
(unicode modifier
) to the end of the regular expression definition. So the expression became:
preg_replace('/\s+/u', ' ',$columnvalue))
And then the encoding of the page was correct.