I am using SaxParser to read the large complex XML file. I do not wish to create the model class as I do not know the exact data which will be coming in the XML so I am trying to find if there is a generic way of reading the XML data using some sort of Context.
I have used a similar approach for JSON using the Jackson, which worked very well for me. Since I am new to Sax Parser, I cannot completely understand how to achieve the same. for complex inner values, I am unable to establish a parent-child relationship and I am unable to build relationships between tags and attributes.
Following is the code I have so far:
ContextNode
my generic class to store all XML information using the parent-child relationships.
@Getter
@Setter
@ToString
@NoArgsConstructor
public class ContextNode {
protected String name;
protected String value;
protected ArrayList<ContextNode> children = new ArrayList<>();
protected ContextNode parent;
//Constructor 1: To store the simple field information.
public ContextNode(final String name, final String value) {
this.name = name;
this.value = value;
}
//Constructor 2: To store the complex field which has inner elements.
public ContextNode(final ContextNode parent, final String name, final String value) {
this(name, value);
this.parent = parent;
}
Following is my method to parse XML using SAX within EventReader.class
public class EventReader{
//Method to read XML events and create pre-hash string from it.
public static void xmlParser(final InputStream xmlStream) {
final SAXParserFactory factory = SAXParserFactory.newInstance();
try {
final SAXParser saxParser = factory.newSAXParser();
final SaxHandler handler = new SaxHandler();
saxParser.parse(xmlStream, handler);
} catch (ParserConfigurationException | SAXException | IOException e) {
e.printStackTrace();
}
}
}
Following is my SaxHandler
:
import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;
import java.util.HashMap;
public class SaxHandler extends DefaultHandler {
private final List<String> XML_IGNORE_FIELDS = Arrays.asList("person:personDocument","DocumentBody","DocumentList");
private final List<String> EVENT_TYPES = Arrays.asList("person");
private Map<String, String> XML_NAMESPACES = null;
private ContextNode contextNode = null;
private StringBuilder currentValue = new StringBuilder();
@Override
public void startDocument() {
ConstantEventInfo.XML_NAMESPACES = new HashMap<>();
}
@Override
public void startElement(final String uri, final String localName, final String qName, final Attributes attributes) {
//For every new element in XML reset the StringBuilder.
currentValue.setLength(0);
if (qName.equalsIgnoreCase("person:personDocument")) {
// Add the attributes and name-spaces to Map
for (int att = 0; att < attributes.getLength(); att++) {
if (attributes.getQName(att).contains(":")) {
//Find all Namespaces within the XML Header information and save it to the Map for future use.
XML_NAMESPACES.put(attributes.getQName(att).substring(attributes.getQName(att).indexOf(":") + 1), attributes.getValue(att));
} else {
//Find all other attributes within XML and store this information within Map.
XML_NAMESPACES.put(attributes.getQName(att), attributes.getValue(att));
}
}
} else if (EVENT_TYPES.contains(qName)) {
contextNode = new ContextNode("type", qName);
}
}
@Override
public void characters(char ch[], int start, int length) {
currentValue.append(ch, start, length);
}
@Override
public void endElement(final String uri, final String localName, final String qName) {
if (!XML_IGNORE_FIELDS.contains(qName)) {
if (!EVENT_TYPES.contains(qName)) {
System.out.println("QName : " + qName + " Value : " + currentValue);
contextNode.children.add(new ContextNode(qName, currentValue.toString()));
}
}
}
@Override
public void endDocument() {
System.out.println(contextNode.getChildren().toString());
System.out.println("End of Document");
}
}
Following is my TestCase which will call the method xmlParser
@Test
public void xmlReader() throws Exception {
final InputStream xmlStream = getClass().getResourceAsStream("/xmlFileContents.xml");
EventReader.xmlParser(xmlStream);
}
Following is the XML I need to read using a generic approach:
<?xml version="1.0" ?>
<person:personDocument xmlns:person="https://example.com" schemaVersion="1.2" creationDate="2020-03-03T13:07:51.709Z">
<DocumentBody>
<DocumentList>
<Person>
<bithTime>2020-03-04T11:00:30.000+01:00</bithTime>
<name>Batman</name>
<Place>London</Place>
<hobbies>
<hobby>painting</hobby>
<hobby>football</hobby>
</hobbies>
<jogging distance="10.3">daily</jogging>
<purpose2>
<id>1</id>
<purpose>Dont know</purpose>
</purpose2>
</Person>
</DocumentList>
</DocumentBody>
</person:personDocument>
Providing the answer as it can be helpful to someone in the future:
First we need to create a class ContextNode
which can hold the information:
@Getter
@Setter
public class ContextNode {
protected String name;
protected String value;
protected ArrayList<ContextNode> attributes = new ArrayList<>();
protected ArrayList<ContextNode> children = new ArrayList<>();
protected ContextNode parent;
protected Map<String, String> namespaces;
public ContextNode(final ContextNode parent, final String name, final String value) {
this.parent = parent;
this.name = name;
this.value = value;
this.namespaces = parent.namespaces;
}
public ContextNode(final Map<String, String> namespaces) {
this.namespaces = namespaces;
}
public ContextNode(final Map<String, String> namespaces) {
this.namespaces = namespaces;
}
}
Then we can read the XML and store the information in the context node:
import lombok.Getter;
import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;
import java.security.NoSuchAlgorithmException;
import java.util.*;
public class SaxHandler extends DefaultHandler {
//Variables needed to store the required information during the parsing of the XML document.
private final Deque<String> path = new ArrayDeque<>();
private final StringBuilder currentValue = new StringBuilder();
private ContextNode currentNode = null;
private ContextNode rootNode = null;
private Map<String, String> currentAttributes;
private final HashMap<String, String> contextHeader = new HashMap<>();
@Override
public void startElement(final String uri, final String localName, final String qName, final Attributes attributes) {
//Put every XML tag within the stack at the beginning of the XML tag.
path.push(qName);
//Reset attributes for every element
currentAttributes = new HashMap<>();
//Get the path from Deque as / separated values.
final String p = path();
//If the XML tag contains the Namespaces or attributes then add to respective Namespaces Map or Attributes Map.
if (attributes.getLength() > 0) {
//Loop over every attribute and add them to respective Map.
for (int att = 0; att < attributes.getLength(); att++) {
//If the attributes contain the : then consider them as namespaces.
if (attributes.getQName(att).contains(":") && attributes.getQName(att).startsWith("xmlns:")) {
contextHeader.put(attributes.getQName(att).substring(attributes.getQName(att).indexOf(":") + 1), attributes.getValue(att));
} else {
currentAttributes.put(attributes.getQName(att), attributes.getValue(att).trim());
}
}
}
if (rootNode == null) {
rootNode = new ContextNode(contextHeader);
currentNode = rootNode;
rootNode.children.add(new ContextNode(rootNode, "type", qName));
} else if (currentNode != null) {
ContextNode n = new ContextNode(currentNode, qName, (String) null);
currentNode.children.add(n);
currentNode = n;
}
}
@Override
public void characters(char[] ch, int start, int length) {
currentValue.append(ch, start, length);
}
@Override
public void endElement(final String uri, final String localName, final String qName) {
try {
System.out.println("completed reading");
System.out.println(rootNode);
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
}
rootNode = null;
//At the end of the XML element tag reset the value for next element.
currentValue.setLength(0);
//After completing the particular element reading, remove that element from the stack.
path.pop();
}
private String path() {
return String.join("/", this.path);
}
}
You may need to make some additional changes based on your particular requirement. This is just a sample that gives some idea.