javaandroidxmlandroid-xmlpullparser

XmlResourceParser.getText() drops text after single quote char, ignores double quotes


Currently trying to implement an Android version of my iOS application and running into some issues parsing XML where the text contains a single quote or double quote character (it's a dictionary app for a foreign language).

All of my app's data is loaded from an XML resource file. Here's an example of that file:

<entry>
    <word>afa'i fā</word>
    <definition>See under "afa". Figurative (especially in poetry), king or queen: "hotau afa'i fā".</definition>  
</entry>

I retrieve an XmlResourceParser by calling:

XmlResourceParser parser = getResources().getXml(R.xml.data);
parse(parser);

Here's my parsing code:

public void parse(XmlResourceParser parser) throws XmlPullParserException, IOException {
    int eventType = parser.getEventType();
    while (eventType != XmlPullParser.END_DOCUMENT) {
        switch (eventType) {
            case XmlPullParser.START_TAG:
                startTag(parser.getName(), parser);
                break;
            case XmlPullParser.END_TAG:
                endTag(parser.getName(), parser);
                break;
            case XmlPullParser.TEXT:
                foundText(parser.getText());
                break;
            default:
                break;
        }
        eventType = parser.next();
    }
}

When parsing the text, XmlResourceParser's getText() method drops everything after the ' and picks right back up with the text inside of the next node. Additionally, it just ignores the double quotes. My result looks like this:

(word) 
afa

(definition)
See under afa. Figurative (especially in poetry), king or queen: hotau afa

I've scoured the docs and can't seem to find any mention of dealing with single and double quotes in the documentation... The only thing I can think is that the XmlResourceParser doesn't like the literal characters and is instead expecting entity codes, but I've tried a swapping them out and it still ignores them.


Solution

  • It looks like the XmlResourceParser returned by getResources().getXml() is doing some extra things according to the docs:

    https://developer.android.com/reference/android/content/res/Resources.html#getXml(int)

    Return an XmlResourceParser through which you can read a generic XML resource for the given resource ID.

    The XmlPullParser implementation returned here has some limited functionality. In particular, you can't change its input, and only high-level parsing events are available (since the document was pre-parsed for you at build time, which involved merging text and stripping comments).

    While it doesn't say anything explicitly about single or double quotes, it's apparently doing something with the XML. Without changing any of my code to parse the XML, I get the desired output by initializing my own XmlPullParser with the following:

    InputStream in = getResources().openRawResource(R.raw.data);
    XmlPullParser parser = Xml.newPullParser();
    parser.setInput(in, null);
    parse(parser);