xmlencodingutf-8xml-validationgoogle-product-search

xml validation issue with é (xE9) character


I am having an issue with an XML file that I am generating from data from my database.

I am specifying an encoding type of UTF-8.

I have some text that when I view it in a browser, or in the database appears to represent a é character. However, when I view the XML file in Notepad++ it shows as [xE9].

This is the definition at the top of my XML file:

<?xml version="1.0" encoding="UTF-8" ?>
<rss version ="2.0" xmlns:g="http://base.google.com/ns/1.0">

This is an excerpt from my XML file and shows the character that is causing issues. I'm confused as to why this shows as non-UTF-8 character as it does below, but this is the reason why my XML is not valid.

<description><![CDATA[work appliqu顤ress. Picco three-quarter sleeved style. Cutwork appliqu顦eatures fitted, with side pockets.]]></description>

In my PHP script I am using the htmlspecialchars function, but it doesn't appear to deal with this character:

<description><![CDATA[<?php echo htmlspecialchars($product['product-description']) ?: 'CRMPicco Online'; ?>]]></description>

Unfortunately, there are a number of instances in the file where this character is present so I can't just remove that one character from the database.

Should I be able to clean this up in PHP?


Solution

  • This can be done using the iconv function in PHP:

    $text = iconv("UTF-8","UTF-8//IGNORE",$text);
    

    I have changed the code to use this, and it works.