rdfnamed-entity-extraction

rdf representation of entity references in text


Consider a sentence like:

John Smith travelled to Washington.

A name tagger would identify, on a good day, 'John Smith' as a person, and 'Washington' as a place. However, without other evidence, it can't tell which of all the possible 'John Smith's in the world, or even which of the various 'Washington's, it's got.

Eventually, some resolution process might decide, based on other evidence. Until that point, however, what is a good practice for representing these references in RDF? Assign them made-up unique identifiers in some namespace? Make blank tuples (e.g. 'Some person named John Smith was referenced in Document d'.)? Some other alternative? A book I have gives an example involving anonymous weather stations, but I am not quite following how their example fits in with everything else about RDF being described.


Solution

  • Assign them unique identifiers in your own namespace. If you later discover that this "Washington" is the same as http://dbpedia.org/resource/Washington,_D.C., or whatever, you can add an owl:sameAs to assert that.