javaxmlxml-parsingampersand

Unable to parse & data from XML using XML Factory Instance parser


I have an XML which I need to parse using XMLInputFactory(java.xml.stream). XML is of this type:

<SACL>
<Criteria>Dinner</Criteria>
<Value> Rice & amp ;(without spaces) Beverage </Value>
</SACL>

I am parsing this using XML Factory Reader in JAVA and my code is:

if(xmlEvent.asStartElement().getName().getLocalPart().equals("Value"){
      xmlEvent = xmlEventReader.nextEvent();
      value = xmlEvent.asCharacters().getData().trim();  //Issue is in the if bracket only
}

(xmlEventReader = XMLInputFactory.newInstance().createXMLEventReader(new FileInputStream(file.getPath())); //using java.xml.stream.XMLEventReader

But it is parsing the data like this only "Rice" (missing & Beverage) Expected Output : Rice & Beverage

Can someone suggest what is the issue with "& amp ;"(without spaces) and how can it be fixed?


Solution

  • I've worked on a project that did XML parsing recently, so I know almost exactly what's happening here: the parser sees &amp; as a separate event (XMLStreamConstants.ENTITY_REFERENCE).

    Try setting property XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES to true in your XML parser's options. If the parser is properly implemented, the entity is replaced and made part of the text.

    Keep in mind that the parser is allowed to split it into multiple characters events, especially if you have large pieces of text. Setting property XMLInputFactory.IS_COALESCING to true should prevent that.