sesameopenrdfrdf4j

Loading big RDF file into Sesame


I'm trying to create a SPARQL endpoint based on Sesame. I installed Tomcat, PostgreSQL, and deployed a Sesame's web application. I created a repository based on PostgreSQL RDF store. Now i need to load a big ttl file (540M triples, file size is several GB) into a repository. Loading a big file over Workbench is not a good solution - it will take several days. What is the best non-programming solution to load the data? Are there tools like "console" to load data? For example, Virtuoso has isql tool for bulk loading...


Solution

  • There is no ready-made bulk loading tool available for Sesame that I am aware of - though Sesame-compatible triplestore vendors do have such tooling available as part of their specific database. Programming a bulk-upload solution is not particularly hard, but we somehow never got around to including such a tool in the Sesame core distribution.

    540M triples, by the way, is probably too large for any of Sesame's default stores - the Native Store only scales to about 150M, and loading such a large dataset into the memory store is just too unwieldy (even if you had the available RAM). So you probably need to look into using a Sesame-compatible database provided by a third party. There are many choices available, both commercial and free/open-source, see this overview on the Sesame website for a list of some suggestions.