urlurirdfrdfs

What URI scheme should be used for a local concept in RDF?


Consider the case where you have some knowledge you want to name and you want to put it into a knowledge graph format like the resource description framework (RDF). However, you don't have an email, a web domain, or access to a namespace authority to generate a URI for the RDF knowledge graph.

This rules out tag uris, cool uris, and most other schemes, respectively.

Some possible options that I am not entirely happy with for the mentioned reasons:

  1. http://localhost/myConcept but this implies a resolvable location. It might also still imply identical concepts for all interpreters of your knowledge graph.
  2. file:///myConcept file scheme but this implies there is a resolvable physical location.
  3. urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 uuid scheme, but this doesn't let you put a human readable component in the URI. It would be great if the uuid scheme allowed urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6/myConcept
  4. Magnet uris were envisioned to help communicate between local machines and the web. But they remain a draft, aren't well defined, and the examples reuse other schemes that depend on naming authorities.
  5. data:,myConcept data scheme but this also depends on registering a mime type, and as far as I can tell there aren't any mime types for abstract concepts. It also fails to encode any type of uniqueness such as would be the case with encoded files or communicate that this concept is only locally unique.
  6. informal schemes like urn:sha1:, but these imply that there is some content to be hashed - and concepts with identical names but different meanings would get assigned the same hash.

What I am looking for identifies a concept in a unique way on a local machine that when communicated with others implies that the concept name can only be interpreted as unique in that single communication and may not be integrated with other knowledge graphs before being altered to be globally unique. It also doesn't rely on any namespace authorities or emails (which also require registration). Does such a scheme exist (maybe informally)? What would you do given the constraints?

Edit: Just want to clarify my view on emails and web domains. Emails are easy and the registration process is completely automated - you can sign up for one immediately. However, you are dependent on that organization to maintain the email registry, not kick you out (like if your email account is inactive), and not go out of business. Personal web domains require a subscription and it should not be required that publishers of data also pay an upkeep fee. This would likely lead to deregistration when they no longer want to pay the fee and the data can now become ambiguous if another user reuses those URIs for other purposes. Free web domains like yourName.github.io have the same issues as email addresses.


Solution

  • A very interesting question! Yes, there is indeed a standardized URI scheme for local communication-specific resources ‒ cid:, but I would highly recommend not using it for RDF, since you can't really expect that software designed to work with RDF will understand it. You could use it only internally, and you could convert it to mid: when the URI leaves your system, but why? That's not how RDF should work, it's better to pick a stable URI from the very beginning.

    There are several factors that play a role in how a URI is chosen:

    1. Should the URI be dereferenceable?
    2. Is it possible that another person may pick the same URI for a different thing?
    3. Is it desirable that another person may pick the same URI for the same thing?
    4. Should the URI be "obvious", i.e. being used for what the scheme is used for?

    Let's see what options are there based on these criteria:

    http(s):

    1. Yes, http URIs are expected to point somewhere. For abstract resources, you can't find the resource itself, but at least its description. http://localhost/ makes sense only if you run an HTTP server there.
    2. As long as you have control over the authority, no other user is entitled (for some meaning of the word) to use it for their own purposes. However all users are equally entitled to use http://localhost/myConcept, since it is a local URI. It is not immediately obvious what this should mean in the world of RDF where URIs are supposed to be globally unique; you could assign a meaning to it but still you don't have full control over it.
    3. & 4. are not really applicable here, since HTTP is used for any resource/concept in general.

    For http, you should have a usable authority, and you should run a web server that can resolve these URIs, preferably as long as you intend them to be used, ideally forever. You can get one for free, but can you still use this if you don't want to? Yes!

    file:

    1. file URIs should point to files. There is no resolution mechanism, but it is expected that you can open any file:/// URI and if the authority is known, use some platform-defined mechanism to talk to the target computer.
    2. The same thing as for http applies here. file:///myConcept could globally mean only something like "the file myConcept in everyone's file system", which is something you cannot use for your own entities.
    3. Yes, it might make sense to use file:///C:/Windows/notepad.exe to denote a particular commonly used file, but this is not usable much in your situation.
    4. file URIs are used for files. Using it for something else will likely lead only to confusion.

    Using file URIs for abstract things other than files is a long stretch, and even in that case, you will have to solve the same issues as for http URIs, so it's better to use those anyway.

    tag:

    1. Not really, but it is good to be able to contact the authority and find information about the tag somewhere. WebFinger seems like a standardized option.
    2. No, in addition to the guarantees of http URIs, thanks to the time component of a tag URI, you don't risk into anyone else redefining what it means, if you lose control of the authority.
    3. & 4. again not relevant, like for http URIs.

    All in all, these URIs are much like http URIs but without the need to maintain a server that keeps them alive and well-defined. I'd consider this the best solution for the purpose of locally-used identifiers. If you can have a domain or email address at any point in time, it is enough to use them. You can use an .onion domain here too, so you'd have tag:ofj09pokr8fypnybesnuuc62ygw12abxe2lapry3zgi2si8rvt61r2yv.onion,2023:myConcept.

    urn:uuid:

    1. No, URNs are not dereferenceable, they are names.
    2. The UUID would have to be the same, which is extremely unlikely if you pick random UUIDs.
    3. This may happen only if you use hash-based UUIDs, where it is desirable, but likely not applicable to your situation.
    4. uuid URNs can be used for any resource in general.

    You are actually incorrect about having a human-readable component in a UUID ‒ you forget the #! Say you pick urn:uuid:e45f8769-8f64-4301-8c3a-12272eaa3f75 to denote your vocabulary, then there is really nothing wrong with using urn:uuid:e45f8769-8f64-4301-8c3a-12272eaa3f75#myConcept for anything you wish. Individual URI schemes do not have control over the fragment, so if the base URI is "yours", pick any fragment with any hierarchy you want. This is the second-best solution; it is shorter but doesn't convey the "local" meaning you wanted, and you only have control over the fragment.

    data:

    1. Yes, a data URI can be locally dereferenced to the data it stores.
    2. No, the given combination of media type and data is what the URI represents and encodes.
    3. Absolutely, for the same reason as above.
    4. Yes, like for file URIs, data: should be only used for data, i.e. a sequence of bytes intended to be interpreted using a certain media type.

    Don't use this for abstract things which are not data. While you may get some uniqueness from a URI like data:application/x.my.very.special.namespace,myConcept, it's still fundamentally a sequence of 9 bytes.

    magnet:, urn:sha1:, ni: etc.

    These are based on hashes, so if you don't have a hash, no point in using them.

    There are also some other type of URIs that I am aware of, so let's take a look at them too:

    urn:oid:

    OIDs have a hierarchy and you can get one for free to create your own hierarchy, and you could even use it for temporary identifiers with counters and other things. Otherwise it is same as urn:uuid:.

    urn:urn-5:

    This is a lesser known but cool informal URN namespace which allows you to use random bytes as identification but permits a local part, so (as far as I understand it), you can use something like urn:urn-5:9Q8Vb+6gDOz6IpWyNnfKdVmA6gQ:myConcept. The specification calls the part after : a "counter", but permits an arbitrary string if the resulting URN is valid, so you could use any path and any fragment too, of course (or ?= for a sort of query). I'd say this is the 3rd best solution ‒ it is completely decentralized and seems to be made for this purpose, has the parts you require, but it is not very well known.

    urn:publicid:

    This is another cool URN namespace, intended to be formed from SGML/XML PUBLIC identifiers. These don't have any particular mandatory structure (there is the FPI though if you want to use it), and I assume the system is "first come, first served" since you can pretty much use anything to produce these identifiers (or you can use this tool). I'd consider this the 4th best solution, but again it's a bit obscure, and it's a bit "wild west" since (unless you use a verified FPI) you have to come up with your own sufficiently strong way of making sure the names you create are only your own.