fuseki

Fuseki configuration


As outlined in http://wiki.bitplan.com/index.php/Apache_Jena#Script_to_start_Fuseki_server I have been avoiding the complexity of Fuseki configuration files and started the server from a script for my usecases in which I only need one dataset/endpoint. For multiple datasets/endpoints i simply used multiple servers.

Descriptions like:

and questions like:

have been intimidating me since there seem to be so many options and no straight forward way to simply say: please use these dataset from the following directories as the command line version can do for one dataset.

Just look at:

where the user expectation:

java -jar fuseki-0.1.0-server.jar --update --loc=data /dataset
--loc=data2 /dataset2

can be seen that is unfortunately not fullfilled. Instead:

was the answer at the time which is now an outdated link.

So obviously there are people out there getting fuseki to work with multiple datasets. But how do they do it ?

I know how to load a TDB store from a triple file via command line. I know that i could use the WebGUI to setup datasets and load data but that won't work for my multi million (and partly multi-billion) triple files.

What is a (hopefully simple) example for loading multiple triple files and making the result available with the same fuseki server as different datasets and have the SPARQL endpoints running (partly read-only?)


Solution

  • https://jena.apache.org/documentation/fuseki2/fuseki-layout.html gives a hint on the layout of files.

    Using the script to start fuseki i inspected the run directory which in my case was to be found at:

    apache-jena-fuseki-3.16.0/run
    

    There are two subdirectories which are initially empty and stay so if you run things from the commandline:

    By adding a dataset via the webgui http://localhost:3030

    screen shot for adding dataset

    a directory with the name of the dataset in this case:

    databases/cr
    

    and a configuration file

    configuration/cr.ttl
    

    is created. For smaller datasets data can now be added via the webgui. For bigger datasets a copy or symlink of the original loaded tdb data is necessary in the databases directory.

    example symlinks:

    zeus:databases wf$ls -l
    total 48
    drwxr-xr-x  4 wf  admin  136 Sep 14 07:43 cr
    lrwxr-xr-x  1 wf  admin   27 Sep 15 11:53 dblp -> /Volumes/Torterra/dblp/data
    lrwxr-xr-x  1 wf  admin   26 Sep 14 08:10 gnd -> /Volumes/Torterra/gnd/data
    lrwxr-xr-x  1 wf  admin   42 Sep 14 07:55 wikidata -> /Volumes/Torterra/wikidata2020-08-15/data/
    

    By restarting the server without a --loc

    nohup java -jar fuseki-server.jar&
    

    the configurations are automatically picked up. screenshot with multiple datasets

    The good news is that you do not have to bother with the details of the config files this way as long as you do not have any special needs.