sparqlfreebasevirtuosoopenlink-virtuoso

How to build an N-hop SPARQL (virtuoso) query which returns all paths (1-N hops) starting from a provided topic entity?


I am trying to build a subgraph of freebase based on a given topic entity, as the time taken to query the full freebase is too time consuming.

My first attempt at building a 3-hop subgraph was as follows:

PREFIX ns:<http://rdf.freebase.com/ns/>
SELECT ?r0, ?e1, ?r1, ?e2, ?r2, ?e3 WHERE 
{
    ns:m.034rd ?r0 ?e1.
    ?e1 ?r1 ?e2.
    ?e2 ?r2 ?e3.
}

This does not work, as it ignores all paths which are max 1- and 2- Hops away from the topic entity.

The Next attempt I made was as follows:

    PREFIX ns:<http://rdf.freebase.com/ns/>
    SELECT ?r0, ?e1, ?r1, ?e2, ?r2, ?e3 WHERE 
    {
        ns:m.034rd ?r0 ?e1.
        OPTIONAL{
            ?e1 ?r1 ?e2.
        }
        OPTIONAL{
        ?e2 ?r2 ?e3.
        }
    }

This did not work either, although I admittedly don't know why or if I am even using the OPTIONAL tag correctly.

Following my failure to build a single SPARQL query, I tried to iteratively query freebase, and build the graph as such. I have tried two things:

(1):

PREFIX ns:<http://rdf.freebase.com/ns/>
SELECT ?r0, ?e1 WHERE 
{
    ns:m.034rd ?r0 ?e1.
}

and

PREFIX ns:<http://rdf.freebase.com/ns/>
SELECT ?r0, ?e1, ?r1, ?e2 WHERE 
{
    ns:m.034rd ?r0 ?e1.
    ?e1 ?r1 ?e2.
}

and

PREFIX ns:<http://rdf.freebase.com/ns/>
SELECT ?r0, ?e1, ?r1, ?e2, ?r2, ?e3 WHERE 
{
    ns:m.034rd ?r0 ?e1.
    ?e1 ?r1 ?e2.
    ?e2 ?r2 ?e3.
}

I had assumed that doing this would provide me with all paths (1-, 2-, and 3-Hops) stemming from the topic entity.

(2) :

PREFIX ns:<http://rdf.freebase.com/ns/>
SELECT ?r0, ?e1 WHERE 
{
    e0 ?r0 ?e1.
}

where e0 was initially set to the topic entity. Following which the above query was run for each e1 returned by the initial query. This process was repeated 3 times (3-Hops).

I am still no closer to finding the correct way to build the subgraph and any help would be greatly appreciated.


Solution

  • Based on the comments, I'll give some pointers here. The question as asked is not suited to a specific answer. The OpenLink Community Forum is usually better than StackOverflow for deeper dives on specific products like Virtuoso.

    First and often foremost, make sure you're running the latest build of Virtuoso, whether Open Source Edition (a/k/a VOS), now 7.2.6.1 or Enterprise/Commercial Edition (a/k/a VEE or VCE), now 8.3+, both of which shipped in July 2021.

    Next, take a look at the basic Performance Tuning settings, and ensure that Virtuoso is set to make use of as much RAM and other system resources as intended -- as default settings are intended to minimize Virtuoso's load on the system, not to maximize query or other performance.

    Then, there is a server-side timeout, MaxQueryExecutionTime, set in the [SPARQL] section of the Virtuoso INI file, as discussed in the product documentation. Note: This timeout does not have effect on SPARQL queries that are run through an iSQL session (which just requires that you prepend the sparql keyword, and append a semicolon, to the SPARQL query you would run through the sparql form; e.g., sparql SELECT ?s ... ORDER BY ?s ;).

    There are some additional Anytime Query settings that may be relevant to adjusting this feature for your deployment.

    If these hints don't prove sufficient, the OpenLink Community Forum should be your next port of call for assistance.