I have a simple model of nodes with interelationships and the RDF file that defines it is a flat list of XML objects, one for each Node. But when I use rdflib to parse the file, operate on it and serialize out to a new XML file, it doesn't keep the nice, flat format. It starts nesting XML objects within other XML objects. Is there anyway I can keep it from doing that?
Here is a simple example. If I have a simple tree as my knowledge graph
A
/ \
B C
/ \ / \
D E F G
which I define like
<rdf:RDF>
<me:Node rdf:about="me:A"/>
<me:Node rdf:about="me:B">
<me:parent rdf:resource="me:A"/>
</me:Node>
<me:Node rdf:about="me:C">
<me:parent rdf:resource="me:A"/>
</me:Node>
<me:Node rdf:about="me:D">
<me:parent rdf:resource="me:B"/>
</me:Node>
<me:Node rdf:about="me:E">
<me:parent rdf:resource="me:B"/>
</me:Node>
<me:Node rdf:about="me:F">
<me:parent rdf:resource="me:C"/>
</me:Node>
<me:Node rdf:about="me:G">
<me:parent rdf:resource="me:C"/>
</me:Node>
</rdf:RDF>
when I do a parse()
and then serialize()
the output looks like
<rdf:RDF>
<me:Node rdf:about="me:F">
<me:parent>
<me:Node rdf:about="me:C">
<me:parent>
<rdf:about="me:A"/>
</me:parent>
</me:Node>
</me:parent>
</me:Node>
<me:Node rdf:about="me:G">
<me:parent rdf:resource="me:C"/>
</me:Node>
<me:Node rdf:about="me:E">
<me:parent>
<me:Node rdf:about="me:B">
<me:parent rdf:resource="me:A"/>
</me:Node>
</me:parent>
</me:Node>
<me:Node rdf:about="me:D">
<me:parent rdf:resource="me:B"/>
</me:Node>
</rdf:RDF>
I realize this is perfectly valid and equivalent RDF, but it makes the files harder to parse by other non-rdflib tools. Is there anyway to force all references to use an "rdf:resource" instead of nesting the referenced node inside the XML of the referring node?
(Note, the example is to explain my problem. I'm pretty sure that simple example would not be reordered and nested if just parsed and serialized, but a more complicated example with knowledge graph manipulation between the parse and the serialize does.)
Turns out it's a simple answer. When using the "pretty-xml" format you can specify a max_depth argument.
graph.serialize(destination=out_file, format='pretty-xml', max_depth=1)