rdfrdflibblank-nodes

How to get XML Attributes via rdflib


I have an rdf file with the following content:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>
    <rdf:Description rdf:about="http://someurl.com/def/elementtype/projectState">
        <rdfs:domain rdf:nodeID="projectState_0" />
    </rdf:Description>
</rdf:RDF>

which is parsed by the following code:

import rdflib

g = rdflib.Graph()

with open("problem/err.rdf", 'r', encoding='UTF-8') as fp:
    g.load(fp, format='application/rdf+xml')

for s, p, o in g:
    print(f"subject:{s}")
    print(f"predicate:{p}")
    print(f"object:{o}")
    print()

I'd expect the predicate to expose the attribute nodeID but I did not find a way to get it. The documentation also doesn't acknowledge xml attributes on BNodes (blank nodes without content).


Solution

  • Blank node subjects generally aren't promised to be preserved when importing graphs (some graph databases like GraphDB do offer to option to). When I run the code the first time, the output is

    subject:http://someurl.com/def/elementtype/projectState
    predicate:http://www.w3.org/2000/01/rdf-schema#domain
    object:N4ae82de375104726a1a2e5344ee6a44e
    

    When I run it a second time, the output is

    subject:http://someurl.com/def/elementtype/projectState
    predicate:http://www.w3.org/2000/01/rdf-schema#domain
    object:N79f7d744f68f439388484f02a9367be5
    

    So regarding the question of exposing the nodeId, it is-it's just not respecting the identifier that you gave to it. See more information with this issue.

    I would suggest

    i. Using a different graph database that supports blank node preservation

    ii. Use an XML parser

    iii. Elevate the blank node to an rdf:resource