replacenotepad++triplestoreturtle-rdf

How to replace underscores in a .ttl file only for objects


I have a file containing RDF triples (subject-predicate-object) in turtle syntax (.ttl file) in which I should replace every _ with a space, but only for triple objects (subjects and predicates must remain the same). An example is the following (in my case each object is between double quotes "):

<http://dbpedia.org/resource/Animalia_(book)> <http://dbpedia.org/property/author> "Graeme_Base" .
<http://dbpedia.org/resource/Animalia_(book)> <http://dbpedia.org/property/illustrator> "Graeme_Base" .

I would like to get:

<http://dbpedia.org/resource/Animalia_(book)> <http://dbpedia.org/property/author> "Graeme Base" .
<http://dbpedia.org/resource/Animalia_(book)> <http://dbpedia.org/property/illustrator> "Graeme Base" .

What is the easiest and fastest way to achieve this? The files are very large, so I can't replace underscores one at a time. I've tried using regular expressions in Notepad ++ but I don't understand how to exclude subject and predicate.

thanks a lot for the help


Solution

  • You might use:

    (?:^<[^\n<>]+>\h+<[^<>\n]+>\h+"|\G(?!^))[^_\n]+\K_(?=[^"\n]*")
    

    Explanation

    Regex demo

    In the replacement use a space.