rdffreebaseknowledge-graph

Why is a code instead of the string used in RDF for an object?


For example:

entity:f06574 rdfs:label "Orioles"

Or such a format:

:tt0268252 a :Movie .

In either case, f06574 and tt0268252 are code, not the actual string of the entity or instance. One reason may be due to the fact that the same string may refer to different things, but in RDF world, the identifier of something is always prefixed with its unique URI, so even if a string is used, it wont cause ambiguity, and more readable and an opaque code.

What's the real reason for such a representation? The triples in Freebase are similar.


Solution

  • This is similar to surrogate keys in relational database theory. Surrogate keys are not derived from the application data and thus carry no semantic meaning. This is opposed to natural keys that are derived from the application data.

    The main advantage of surrogate keys is that if the application data changes, it will not require the reference to the data to change. In the case of natural keys if the application data changes, it will cause the reference to the data to change. Hence, all foreign keys will need to be updated accordingly.

    In the semantic web any triples referring to tt0268252 will not need to be updated if we essentially want the label to change from say Movie to Film. If we used strings like http://awesome/movie and it needs to change to film, we will need to change our IRI http://awesome/film, which will go against the principles of the semantic web (that IRIs should not change). Or we will have to live with http://awesome/movie with http://awesome/movie rdfs:label "Film". This could lead to even more confusion rather than opaque code.

    As an aside, that is why some prefer using Persistent uniform resource locators that provide resilience when the underlying web resources change. In a similar way these "codes" provide resilience when the application data changes.