I'm having the following exception when trying to parse some XML:
org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 0: not well-formed (invalid token)
The main issue is that this has only happened in Android 2.2 or 2.3 devices, but the weirdest thing is that the first time I parse the response it is ok, but all the following tries give me the parsing exception.
My code is as follows:
URL url = new URL("http://m.ideasmusik.com/rss/?ct=mx");
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
//InputSource is = new InputSource("http://m.ideasmusik.com/rss/?ct=mx");
//is.setEncoding(HTTP.UTF_8);
// Parse content
MusicRSSParser parser = new MusicHandler.MusicRSSParser(); //DefaultHandler
XMLReader xr = sp.getXMLReader();
xr.setContentHandler(parser);
InputSource in = new InputSource(url.openStream());//is.getByteStream());
in.setEncoding(HTTP.UTF_8);
xr.parse(in);
The XML is UTF-8 (I've read that is a common problem to have incorrect encoding).
Any guess on what is going wrong? I thought that it could be something with my handler but it crashes before my logic applies, right after the startDocument() method.
i have tried with Url instead of InputStream with the same result.
EDIT
If I go to Application Management and erase app caché, then it works ok, for the first time. How can it be affecting the parsing??
Got it!
The problem is that the RSS has a problem!
Not every browser shows it (when they format it with colors they erase the problem), but the source code begins like:
<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<rss version=\"2.0\">
<channel>
<title>Top Canciones</title>
<link>m.ideasmusik.com/rss/?ct=mx&</link> ...
The problem is that XML can't have & symbols without being escaped.
All the other symbols were escaped in the document but I think they miss that one because it is in the link tag and not as main content.
Somehow on the first run the SAX parser ignores that..
What I did (while the RSS is fixed) was to get the string response and remove that & manually before parsing the XML. I know that is a horrible solution but it's the quickest and easiest solution for the moment.