openrefine

Open Refine: Exporting nested XML with templating


I have a question regarding the templating option for XML in Open Refine. Is it possible to export data from two columns in a nested XML-structure, if both columns contain multiple values, that need to be split first? Here's an example to illustrate better what I mean. My columns look like this:

Column1 Column2
https://d-nb.info/gnd/119119110;https://d-nb.info/gnd/118529889 Grützner, Eduard von;Elisabeth II., Großbritannien, Königin
https://d-nb.info/gnd/1037554086;https://d-nb.info/gnd/1245873660 Müller, Jakob;Meier, Anina

Each value separated by semicolon in Column1 has a corresponding value in Column2 in the right order and my desired output would look like this:

    <rootElement>    
       <recordRootElement>
        ...    
            <edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
               <skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
            </edm:Agent>
            
            <edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
               <skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
            </edm:Agent>
        ...
        </recordRootElement>
        <recordRootElement>
        ...    
            <edm:Agent rdf:about="https://d-nb.info/gnd/1037554086">
               <skos:prefLabel xml:lang="zxx">Müller, Jakob</skos:prefLabel>
            </edm:Agent>
            
            <edm:Agent rdf:about="https://d-nb.info/gnd/1245873660">
               <skos:prefLabel xml:lang="zxx">Meier, Anina</skos:prefLabel>
            </edm:Agent>
        ...
        </recordRootElement>
    <rootElement>

(note: in my initial posting, the position of the root element was not indicated and it looked like this:

<edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
    <skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
</edm:Agent>
        
<edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
     <skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
</edm:Agent>

)

I managed to split the values separated by ";" for both columns like this

{{forEach(cells["Column1"].value.split(";"),v,"<edm:Agent rdf:about=\""+v+"\">"+"\n"+"</edm:Agent>")}}
{{forEach(cells["Column2"].value.split(";"),v,"<skos:prefLabel xml:lang=\"zxx\">"+v+"</skos:prefLabel>")}}

but I can't find out how to nest the splitted skos:prefLabel into the edm:Agent element. Is that even possible? If not, I would work with seperate columns or another workaround, but I wanted to make sure, if there's a more direct way before.

Thank you! Kristina


Solution

  • I am going to expand the answer from RolfBly using the Templating Exporter from OpenRefine.

    I do have the following assumptions:

    1. There is some other column left of Column1 acting as record identifying column (see first screenshot).
    2. The columns actually have some proper names
    3. The columns URI and Name are the only columns with multiple values. Otherwise we might produce empty XML elements with the following recipe.

    Screenshot of expected data

    We will use the information about records available via GREL to determine whether to write a <recordRootElement> or not.

    Recipe:

    1. Split first Name and then URI on the separator ";" via "Edit cells" => "Split multi-valued cells".
    2. Go to "Export" => "Templating..."

    Screenshot of the templating export dialog from OpenRefine

    In the prefix field use the value

    <?xml version="1.0" encoding="utf-8"?>
    <rootElement>
    
    

    Please note that I skipped the namespace imports for edm, skos, rdf and xml.

    In the row template field use the value:

      {{if(row.index - row.record.fromRowIndex == 0, '<recordRootElement>', '')}}
        <edm:Agent rdf:about="{{escape(cells['URI'].value, 'xml')}}">
          <skos:prefLabel xml:lang="zxx">{{escape(cells['Name'].value, 'xml')}}</skos:prefLabel>
        </edm:Agent>
      {{if(row.index - row.record.fromRowIndex == row.record.rowCount - 1, '</recordRootElement>', '')}}
    

    The row separator field should just contain a linebreak.

    
    

    In the suffix field use the value:

    
    </rootElement>