databasenosqlrdfsemantic-webdocument-oriented-db

Is it possible to use RDF storage also as a document-oriented database?


Suppose I have a large ammount of heterogeneous JSON documents (i.e. named key-value mappings) and a hierarchy of classes (i.e. named sets) that these documents are attached to. I need to set up a data structure that will allow:

  1. CRUD operations on JSON documents.
  2. Retrieving JSON documents by ID really quickly.
  3. Retrieving all JSON documents that are attached to a certain class really quickly.
  4. Editing class hierarchy: adding/deleting classes, rearranging them.

I've initially came up with the idea of storing JSON documents in a document-oriented database (like CouchDB or MongoDB) and storing class hierarchy in an RDF storage (like 4store). 1, 2 and 4 are then figured out naturally, and 3 solved by maintaining list of attached document IDs for every class in the storage.

But then I figured that a RDF storage could actually do the document-oriented part of retrieving JSON documents by ID. At a first glance this seems true, but I'm still concerned about 2 and 3. Is there a RDF storage that is able to retrieve documents (nodes) at a speed document-oriented db's serve documents? How fast will it serve 3-like queries? I've heard a little bit about RDF storages being slow, reification problem, etc.

Is there an RDF storage that is also as comfortable for casual retrieving objects by ID, as CouchDB, for example? What is the difference between using document-oriented and RDF storage for storing, retrieving and editing JSON-like objects?


Solution

  • The closest thing you can use in RDF databases are named graphs. In a named graph, you can put a set of RDF triples. This set of triples can be asserted from one or many RDF documents depending on your needs. Lets say you want one named graph per RDF document. You could name the graph with a URI that reflects the file location a URL or a IRI. For instance ...

    http://yourdomain/files/rdf_file_1
    

    or

    file:///home/myrdffiles/file1
    

    4store is a quad store. Quad stores support named graphs and 4store is specially design to handle this.

    With 4store you can run the following command to assert triples in a Named Graph:

    curl -T your_file.rdf http://your_4store_database/data/http://yourdomain/files/rdf_file_1
    

    After /data/ you can put the GRAPH identifier (IRI) where the triples are going to be asserted. See 4store sparql server and 4store Client Libs for more details.

    Once you have your data asserted, with SPARQL you can also use the named graph to direct your query to that graph:

    SELECT * WHERE {
       GRAPH <http://youdomain/files/rdf_file_1> {
            .... some triple patterns in here ....
       }
    }
    

    Moreover, 4store also supports JSON so you can retrieve the SPARQL resultset directly in JSON.

    If you decide to use 4store you'll find valuable support here: http://4store.org/contact