xmljqxq

Extract elements containing tags with certain values using jq/xq


I'm trying to extract coordinates of nodes with the <tag k="power"> tags.

I've tried several methods.

This one only returns two values (403564136 and 403564138) and an error: jq: error (at <stdin>:1): Cannot index array with string "@k". It is probably failing to process the entity where contains multiple elements as opposed to one element, resulting in the xml->json conversion generating two different types of data - arrays and objects. Not sure what would be the best way to fix it though:

xq '.osm.node[] | select(any(.tag; .["@k"] == "power"))' power.xml

I would be able to solve the problem by just searching for plain text, but it yields 0 results:

xq '.osm.node[] | select( index("power") )' power.xml

or

xq '.osm.node[] | select( any(. == "power") )' power.xml

I'm probably missing something, but I can't figure out what I'm doing wrong.

power.xml:

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="CGImap 0.8.3 (3907222 thorn-02.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
    <node id="403564136" visible="true" version="4" changeset="27918722" timestamp="2015-01-04T19:50:21Z" user="k__" uid="156900" lat="58.3795467" lon="26.6902636">
        <tag k="power" v="tower"/>
    </node>
    <node id="403564138" visible="true" version="2" changeset="14825596" timestamp="2013-01-28T18:28:16Z" user="k__" uid="156900" lat="58.3798399" lon="26.6882638">
        <tag k="power" v="tower"/>
    </node>
    <node id="403564140" visible="true" version="3" changeset="21131355" timestamp="2014-03-16T08:53:37Z" user="k__" uid="156900" lat="58.3811404" lon="26.6822486">
        <tag k="power" v="tower"/>
        <tag k="source" v="Maa amet WMS 2009; survey"/>
    </node>
    <node id="403564141" visible="true" version="3" changeset="14825596" timestamp="2013-01-28T18:28:17Z" user="k__" uid="156900" lat="58.3805103" lon="26.6790130">
        <tag k="power" v="tower"/>
    </node>
    <node id="403564142" visible="true" version="2" changeset="1399220" timestamp="2009-06-01T22:33:48Z" user="green525" uid="64433" lat="58.3801485" lon="26.6771179">
        <tag k="power" v="tower"/>
        <tag k="ref" v="4"/>
        <tag k="source" v="Maa amet WMS 2009; extrapolation"/>
    </node>
    <node id="409079906" visible="true" version="3" changeset="47530271" timestamp="2017-04-07T07:39:53Z" user="juhanjuku" uid="152305" lat="58.0699088" lon="27.0763265">
        <tag k="power" v="pole"/>
    </node>
    <node id="409079908" visible="true" version="3" changeset="32801064" timestamp="2015-07-22T12:40:52Z" user="evaldmaa" uid="1706132" lat="58.0697186" lon="27.0755833">
        <tag k="power" v="tower"/>
    </node>
    <node id="579469806" visible="true" version="1" changeset="3279698" timestamp="2009-12-03T11:17:02Z" user="maaamet-import" uid="204356" lat="58.1991523" lon="26.8752022"/>
    <node id="319174533" visible="true" version="3" changeset="10614880" timestamp="2012-02-07T18:36:07Z" user="k__" uid="156900" lat="58.2019064" lon="26.8798802">
        <tag k="railway" v="level_crossing"/>
    </node>
</osm>

Solution

  • Appreciate trying to using xq bundled along with yq for XML parsing. The reason for your error is that .tags is encoded as an an array of objects in couple of instances. You need to be able to distinguish between them in your filter while extracting. Also filter out objects that don't have the .tag property at all

    One simple way to solve it would be to use an explicit if statement to do the comparison

    xq '
    .osm.node[] | 
    select(.tag != null) | 
    if (.tag|type == "array") then 
      select(any(.tag[]; .["@k"] == "power")) 
    else 
      select(any(.tag; .["@k"] == "power")) 
    end
    ' power.xml
    

    or make the conditional branch as a function

    xq '
    def nodeSel($p): if ($p|type == "array") then select(any($p[]; .["@k"] == "power")) else select(any($p; .["@k"] == "power")) end;
    .osm.node[] | 
    select(.tag != null) |
    nodeSel(.tag)
    ' power.xml