I have a file containing RDF triples (subject-predicate-object) in turtle syntax (.ttl file) in which I should replace every _ with a space, but only for triple objects (subjects and predicates must remain the same). An example is the following (in my case each object is between double quotes "):
<http://dbpedia.org/resource/Animalia_(book)> <http://dbpedia.org/property/author> "Graeme_Base" .
<http://dbpedia.org/resource/Animalia_(book)> <http://dbpedia.org/property/illustrator> "Graeme_Base" .
I would like to get:
<http://dbpedia.org/resource/Animalia_(book)> <http://dbpedia.org/property/author> "Graeme Base" .
<http://dbpedia.org/resource/Animalia_(book)> <http://dbpedia.org/property/illustrator> "Graeme Base" .
What is the easiest and fastest way to achieve this? The files are very large, so I can't replace underscores one at a time. I've tried using regular expressions in Notepad ++ but I don't understand how to exclude subject and predicate.
thanks a lot for the help
You might use:
(?:^<[^\n<>]+>\h+<[^<>\n]+>\h+"|\G(?!^))[^_\n]+\K_(?=[^"\n]*")
Explanation
(?:
Non capturing group
^
Assert start of the string<[^\n<>]+>\h+<[^<>\n]+>\h+"
Match 2 times an opening-closing angle bracket followed by 1+ horizontal whitespace chars and then match "
|
Or\G(?!^)
Assert position at the end of previous match, not at the start)
Close non capturing group[^_\n]+\K_
Match 1+ times not an underscore or newline using a negated character class and forget what was matched using \K
. Then match the underscore.(?=[^"\n]*")
Positive lookahead to assert what is on the right is a closing "
In the replacement use a space.