I have an XML which I need to parse using XMLInputFactory(java.xml.stream). XML is of this type:
<SACL>
<Criteria>Dinner</Criteria>
<Value> Rice & amp ;(without spaces) Beverage </Value>
</SACL>
I am parsing this using XML Factory Reader in JAVA and my code is:
if(xmlEvent.asStartElement().getName().getLocalPart().equals("Value"){
xmlEvent = xmlEventReader.nextEvent();
value = xmlEvent.asCharacters().getData().trim(); //Issue is in the if bracket only
}
(xmlEventReader = XMLInputFactory.newInstance().createXMLEventReader(new FileInputStream(file.getPath())); //using java.xml.stream.XMLEventReader
But it is parsing the data like this only "Rice" (missing & Beverage) Expected Output : Rice & Beverage
Can someone suggest what is the issue with "& amp ;"(without spaces) and how can it be fixed?
I've worked on a project that did XML parsing recently, so I know almost exactly what's happening here: the parser sees &
as a separate event (XMLStreamConstants.ENTITY_REFERENCE
).
Try setting property XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES
to true
in your XML parser's options. If the parser is properly implemented, the entity is replaced and made part of the text.
Keep in mind that the parser is allowed to split it into multiple characters events, especially if you have large pieces of text. Setting property XMLInputFactory.IS_COALESCING
to true
should prevent that.