Fuseki configuration

As outlined in http://wiki.bitplan.com/index.php/Apache_Jena#Script_to_start_Fuseki_server I have been avoiding the complexity of Fuseki configuration files and started the server from a script for my usecases in which I only need one dataset/endpoint. For multiple datasets/endpoints i simply used multiple servers.

Descriptions like:

https://jena.apache.org/documentation/fuseki2/fuseki-config-endpoint.html

and questions like:

fuseki Multiple services found exception

have been intimidating me since there seem to be so many options and no straight forward way to simply say: please use these dataset from the following directories as the command line version can do for one dataset.

Just look at:

https://users.jena.apache.narkive.com/MNZHLT25/multiple-datasets-on-fuseki

where the user expectation:

java -jar fuseki-0.1.0-server.jar --update --loc=data /dataset
--loc=data2 /dataset2

can be seen that is unfortunately not fullfilled. Instead:

http://jena.apache.org/documentation/serving_data/index.html#fuseki-configuration-file

was the answer at the time which is now an outdated link.

So obviously there are people out there getting fuseki to work with multiple datasets. But how do they do it ?

I know how to load a TDB store from a triple file via command line. I know that i could use the WebGUI to setup datasets and load data but that won't work for my multi million (and partly multi-billion) triple files.

What is a (hopefully simple) example for loading multiple triple files and making the result available with the same fuseki server as different datasets and have the SPARQL endpoints running (partly read-only?)

Solution

https://jena.apache.org/documentation/fuseki2/fuseki-layout.html gives a hint on the layout of files.

Using the script to start fuseki i inspected the run directory which in my case was to be found at:

apache-jena-fuseki-3.16.0/run

There are two subdirectories which are initially empty and stay so if you run things from the commandline:

configuration
database

By adding a dataset via the webgui http://localhost:3030

a directory with the name of the dataset in this case:

databases/cr

and a configuration file

configuration/cr.ttl

is created. For smaller datasets data can now be added via the webgui. For bigger datasets a copy or symlink of the original loaded tdb data is necessary in the databases directory.

example symlinks:

zeus:databases wf$ls -l
total 48
drwxr-xr-x  4 wf  admin  136 Sep 14 07:43 cr
lrwxr-xr-x  1 wf  admin   27 Sep 15 11:53 dblp -> /Volumes/Torterra/dblp/data
lrwxr-xr-x  1 wf  admin   26 Sep 14 08:10 gnd -> /Volumes/Torterra/gnd/data
lrwxr-xr-x  1 wf  admin   42 Sep 14 07:55 wikidata -> /Volumes/Torterra/wikidata2020-08-15/data/

By restarting the server without a --loc

nohup java -jar fuseki-server.jar&

the configurations are automatically picked up.

The good news is that you do not have to bother with the details of the config files this way as long as you do not have any special needs.