[SOLVED] How to get XML Attributes via rdflib

How to get XML Attributes via rdflib

I have an rdf file with the following content:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>
    <rdf:Description rdf:about="http://someurl.com/def/elementtype/projectState">
        <rdfs:domain rdf:nodeID="projectState_0" />
    </rdf:Description>
</rdf:RDF>

which is parsed by the following code:

import rdflib

g = rdflib.Graph()

with open("problem/err.rdf", 'r', encoding='UTF-8') as fp:
    g.load(fp, format='application/rdf+xml')

for s, p, o in g:
    print(f"subject:{s}")
    print(f"predicate:{p}")
    print(f"object:{o}")
    print()

I'd expect the predicate to expose the attribute nodeID but I did not find a way to get it. The documentation also doesn't acknowledge xml attributes on BNodes (blank nodes without content).

Solution

Blank node subjects generally aren't promised to be preserved when importing graphs (some graph databases like GraphDB do offer to option to). When I run the code the first time, the output is

subject:http://someurl.com/def/elementtype/projectState
predicate:http://www.w3.org/2000/01/rdf-schema#domain
object:N4ae82de375104726a1a2e5344ee6a44e

When I run it a second time, the output is

subject:http://someurl.com/def/elementtype/projectState
predicate:http://www.w3.org/2000/01/rdf-schema#domain
object:N79f7d744f68f439388484f02a9367be5

So regarding the question of exposing the nodeId, it is-it's just not respecting the identifier that you gave to it. See more information with this issue.

I would suggest

i. Using a different graph database that supports blank node preservation

ii. Use an XML parser

iii. Elevate the blank node to an rdf:resource