I have a 5 MB XML flat structure that I want to access its data later. I use XOM Parser in Java to parse the XML and I don't want to loop on the whole Tree every time I want to retrieve data as it takes a while because of the file size.
The XML looks like this
<TypeDesc Type="Person" Id="1" PKey="X0" xml:lang="EN" ShDes="t1" LongDes="test 1"/>
<TypeDesc Type="Person" Id="2" PKey="X1" xml:lang="EN" ShDes="t2" LongDes="test 2"/>
<TypeDesc Type="Person" Id="3" PKey="X3" xml:lang="EN" ShDes="t3" LongDes="test 2"/>
...
<TypeDesc Type="Person" Id="n" PKey="PAYMN" xml:lang="EN" ShDes="PAYMN" LongDes="payment"/>
<TypeDesc Type="Student" Id="1" PKey="X0" xml:lang="EN" ShDes="t1" LongDes="good"/>
<TypeDesc Type="Student" Id="2" PKey="X1" xml:lang="EN" ShDes="t2" LongDes="bad"/>
<TypeDesc Type="Student" Id="3" PKey="X3" xml:lang="EN" ShDes="t3" LongDes="fair"/>
...
<TypeDesc Type="Student" Id="n" PKey="PAYMN" xml:lang="EN" ShDes="PAYMN" LongDes="fair"/>
In my LOGIC I want to retrieve the longDes of the Node if PKEY = SOMESTUFF AND Type = OtherStuff
Looping on the whole thing and retrieving the longDes if other attributes are satisfied is very expensive.
How can I store my Data so that I can access them in O(1) instead of O(n) so that I do loop on the whole XML for one time and access the data structure for later iterations.
I used a hash table to store data. Constructed a hash table for each type. The key of each hash table is concatenation of all attributes I want to check with and the stored value is what I want to retrieve. It is very efficient and close to O(1)