rdfjenasemantic-websesamen3

Loading Notation3 into a Database


How do you parse and load the triples represented by a Notation3 file into a database? I'm somewhat familiar with Jena and Sesame, but these seemed geared to process RDF or Turtle, not full Notation3.

I've found relatively few robust tools for dealing with N3. The few I've found are listed here, and only consist of rough Python scripts that can only do basic command line actions and seem to have no standard packaging, distribution or maintenance. The default Python library appears to be notation3.py but I couldn't find a single homepage for it, and found dozens of differing versions scattered around the Internet.

For example, say I have the following N3 representing a botanical classification:

{
   []
       :genus "Abies" ;
       :species "alba" ;
       :name [:value "Silver Fir" ; :usage "common" ; :language "English" ] ;
       :name [:value "European Silver Fir" ; :usage "common" ; :language "English" ] ;
       :name [:value "abeto blanco" ; :usage "common" ; :language "Spanish" ] ;
       :name [:value "abeto plateado" ; :usage "common" ; :language "Spanish" ] ;
       :name [:value "Edeltanne" ; :usage "common" ; :language "German" ] ;
       :name [:value "Silbertanne" ; :usage "common" ; :language "German" ] ;
       :name [:value "Weißtanne" ; :usage "common" ; :language "German" ] ;
       :stem!:type :erect ;
       :stem!:height [ :value!:start 30.0 ; :value!:end 50.0 ; :value!:units "m" ] ;
       :bark!:color :grey ;
       :bark!:ridges :irregular ;
       :foliage!:seasonality :evergreen ;
       :foliage!:type :needle ;
       :foliage!:arrangement :alternate ;
       :foliage!:length [ :value!:start 1.0 ; :value!:end 3.0 ; :value!:units "cm" ] ;
       :foliage!:width [ :value!:start 0.2 ; :value!:end 0.3 ; :value!:units "cm" ] ;
       :foliage!:color :green ;
       :foliage!:spiney :FALSE ;
       :flower [ :gender :male ; :inflorescence :catkin ; :sense :straight ; :color :brown ] ;
       :flower [ :gender :male ; :inflorescence :catkin ; :sense :straight ; :color :yellow ] ;
       :flower [ :gender :female ; :inflorescence :catkin ; :sense :straight ; :color :pink ] ;
       :fruit [ :kind :cone ; :color :brown ; ] ;
}
:is-a :botanical-classification ;
:source [
   :uri <http://originating/site> ;
   :name "John Doe" ;
   :data-collection-date "2005-01-01" ;
] ;
:transcribed-by "Al Nonymous" ;
:transcription-date "2010-09-01" .

I want to be able to load this (and potentially thousands of similar records) into a database so I can run arbitrary queries like, "Who transcribed records containing common Spanish names in the year 2010?" or "What's the average flower color associated with the genus X?"

Is this currently practical to do with current semantic web tools and N3?


Solution

  • The basic problem is that N3 was always something of an experimental notation - the full language was never widely implemented. The diagram in this document is quite informative: your sample uses graph literals, and these lie outside any of the widely implemented N3 subsets. Now that named graphs are more widely used, it would be possible to express the same information in most RDF systems, including Jena, but not by directly parsing your input file.

    If it were me, I would be looking towards writing a front-end transformation stage, probably using a language that's good at strings and templates - such as Ruby. You could then translate your input files into a form that standard RDF processors can handle. For example, the graph literal that denotes the statements made by "Al Nonymous" could be transformed into a bNode denoting the action of Al asserting that classification. Alternatively, you could extract each graph from its literal and save it to a file with a synthesised graph name, thus preserving the nested-graphs structure you currently have. Similarly, the property!path notation could be easily rewritten into standard RDF at the expense of being slightly more verbose.

    Alternatively, ask your data provider to give you output that's in a more readily processable form!