xmlstarlet

Querying the current namespace using xmlstarlet


System: Debian 12

xmlstarlet --version
1.6.1
compiled against libxml2 2.9.14, linked with 20914
compiled against libxslt 1.1.35, linked with 10135

below a reduced debugging (example.xml) version of an xml file:

    <?xml version="1.0" encoding="UTF-8"?>
    <arf:asset-report-collection xmlns:arf="http://scap.nist.gov/schema/asset-reporting-format/1.1">
      <arf:report-requests>
        <arf:report-request>
          <arf:content>
            <ds:data-stream-collection xmlns:ds="http://scap.nist.gov/schema/scap/source/1.2" xmlns:xccdf-1.2="http://checklists.nist.gov/xccdf/1.2" xmlns:xlink="http://www.w3.org/1999/xlink">
              <ds:component>
                <xccdf-1.2:Benchmark id="xccdf_org.ssgproject.content_benchmark_DEBIAN-12">
                  need to get this working
                </xccdf-1.2:Benchmark>
              </ds:component>
            </ds:data-stream-collection>
          </arf:content>
        </arf:report-request>
      </arf:report-requests>
    </arf:asset-report-collection>

I can query data-stream-collection only by using the local-name() xpath command, because the namespace ds is not on the root level:

xmlstarlet sel -t -v "/arf:asset-report-collection/arf:report-requests/arf:report-request/arf:content/*[local-name()=\"data-stream-collection\"]" example.xml

Is this an xpath bug?

How can I query the namespace of data-stream-collection and the URI of that namespace? I already played with namespace-uri() and Axes but so far without luck. I would need ds and http://scap.nist.gov/schema/scap/source/1.2 to be returned by xmlstarlet.

Thank You!


Solution

  • Is this an xpath bug?

    No, because you're not using the ds namespace prefix in your command.

    To do so you have to let the processor know what string the ds namespace prefix is bound to. By default (the --doc-namespace option being in effect) xmlstarlet's select and edit subcommands will read the namespace declarations in the outermost element of the first input file, so they need not be declared explicitly; for other namespaces use the -N option.

    How can I query the namespace of data-stream-collection and the URI of that namespace?
    # shellcheck shell=sh disable=SC2016
    
    xmlstarlet select --text \
      -N ds='http://scap.nist.gov/schema/scap/source/1.2' \
      -t --var P='arf:asset-report-collection/*/*/arf:content/ds:*' \
      -v 'namespace-uri($P)' -n \
      -m '$P/namespace::ds' \
        -v 'concat("{",.,"}",name())' -n \
    example.xml
    

    Output, line 2 in Clark notation:

    http://scap.nist.gov/schema/scap/source/1.2
    {http://scap.nist.gov/schema/scap/source/1.2}ds
    

    Since namespace nodes live on the namespace axis xmlstarlet select can generate the -N options,

    # shellcheck shell=sh disable=SC2016
    
    xmlstarlet select --text -t \
      --var q -o '"' -b \
      -m 'set:distinct(//namespace::*[not(name()="xml")])' \
        -v 'concat(" -N ",name(),"=",$q,.,$q)' \
      -b -n \
    example.xml
    

    where the EXSLT set:distinct function eliminates duplicates and the ubiquitous xml namespace is skipped. This will include namespaces declared in the outermost element but an extra -N shouldn't hurt.

    Output:

     -N arf="http://scap.nist.gov/schema/asset-reporting-format/1.1" -N xlink="http://www.w3.org/1999/xlink" -N xccdf-1.2="http://checklists.nist.gov/xccdf/1.2" -N ds="http://scap.nist.gov/schema/scap/source/1.2"