rdfowlontologyrdflibifc

Meaning of messy code in parsed RDF/XML triples?


I used the codes below to parse a RDF and get triples in excel.

import rdflib
import pandas as pd

g = rdflib.Graph()
g.load('https://standards.buildingsmart.org/IFC/DEV/IFC4/ADD2_TC1/OWL/')

lst_s = []
lst_p = []
lst_o = []

for s, p, o in g:
    lst_s.append(s)
    lst_p.append(p)
    lst_o.append(o)

df = pd.DataFrame(list(zip(lst_s,lst_p,lst_o)), columns=['s','p','o'])
df.to_excel("ifc owl.xlsx") 

I found that there are some messy codes like this: enter image description here

I looked at the N Triples format and found that the highlighted triple in the excel above is as below: enter image description here It seems that the messy code is the "genid2542?". What is the meaning of such messy codes? Is it because of the parsing errors or it corresponds to some meanings? Thank you!


Solution

  • Just re-posting some points from the comments as an answer that can be accepted:

    Those identifiers refer to RDF blank nodes (https://w3.org/TR/rdf11-concepts/#section-blank-nodes) which do not have an identity in real world. They are unique per document.

    Keep in mind that every tool that touches a blank node is not only free to, but in some cases must, rewrite them as a different.