xmlpowershellnamespaces

Using PowerShell to Select Value from Disjoined Inner XML Namespace


I am working with an XML document that has disjoined nested namespaces. Using a Powershell script, I need to loop through the XML nodes and select a value from each of the inner namespaces.

The problem I am having is that the value returned is always from the first set of data within the inner namespace.

Here is a representation of the XML file named nsTest.xml:

<ns1:Root xmlns:ns1="example.com\ns1" xmlns:ns2="example.com\ns2" XsdSchemaValidatable="true">
  <ns1:DataSet>
    <ns1:TimeStamp>
      <ns2:Time>2023-06-01T08:00:00</ns2:Time>
    </ns1:TimeStamp>
  </ns1:DataSet>
  <ns1:DataSet>
    <ns1:TimeStamp>
      <ns2:Time>2024-07-02T08:00:00</ns2:Time>
    </ns1:TimeStamp>
  </ns1:DataSet>
 </ns1:Root>

Here is the PowerShell script I am using:

Set-Location $PSScriptRoot
[Xml]$XMLData = Get-Content "nsTest.xml"

$nsmgr = New-Object System.Xml.XmlNamespaceManager($XMLData.NameTable)
$nsmgr.AddNamespace("ns1", "example.com\ns1")
$nsmgr.AddNamespace("ns2", "example.com\ns2")

$DataSets = $XMLData.SelectNodes("//ns1:Root/ns1:DataSet", $nsmgr)

Write-Host ("The number of items in the dataset is: " + $DataSets.Count)

Write-Host ("DataSets [0] is: " + $DataSets[0].SelectSingleNode("//ns1:TimeStamp/ns2:Time", $nsmgr).InnerText)
Write-Host ("DataSets [1] is: " + $DataSets[1].SelectSingleNode("//ns1:TimeStamp/ns2:Time", $nsmgr).InnerText)

Write-Host $DataSets[0].InnerXml
Write-Host $DataSets[1].InnerXml

Here are the results I am getting:

The number of items in the dataset is: 2
DataSets [0] is: 2023-06-01T08:00:00
DataSets [1] is: 2023-06-01T08:00:00
<ns1:TimeStamp xmlns:ns1="example.com\ns1"><ns2:Time xmlns:ns2="example.com\ns2">2023-06-01T08:00:00</ns2:Time></ns1:TimeStamp>
<ns1:TimeStamp xmlns:ns1="example.com\ns1"><ns2:Time xmlns:ns2="example.com\ns2">2024-07-02T08:00:00</ns2:Time></ns1:TimeStamp>

Here is what I would have expected to see:

The number of items in the dataset is: 2
DataSets [0] is: 2023-06-01T08:00:00
DataSets [1] is: 2024-07-02T08:00:00
<ns1:TimeStamp xmlns:ns1="example.com\ns1"><ns2:Time xmlns:ns2="example.com\ns2">2023-06-01T08:00:00</ns2:Time></ns1:TimeStamp>
<ns1:TimeStamp xmlns:ns1="example.com\ns1"><ns2:Time xmlns:ns2="example.com\ns2">2024-07-02T08:00:00</ns2:Time></ns1:TimeStamp>

I have tried using the [local-name() = 'Time'] convention instead of specifying the namespace, but that didn't make a difference.

Is there something about XML namespaces that I am not understanding?


Solution

  • Pragmatically speaking, you can use PowerShell's adaption of the XML DOM,[1] which is namespace-agnostic, in combination with member-access enumeration:

    # -> 2
    $xmlData.Root.DataSet.Count
    
    # -> @('2023-06-01T08:00:00', '2024-07-02T08:00:00')
    $xmlData.Root.DataSet.TimeStamp.Time
    
    # -> @(
    #     '<ns2:Time xmlns:ns2="example.com\ns2">2023-06-01T08:00:00</ns2:Time>',
    #     '<ns2:Time xmlns:ns2="example.com\ns2">2024-07-02T08:00:00</ns2:Time>'
    #    )
    $xmlData.Root.DataSet.TimeStamp.InnerXml 
    

    As for what you tried:

    By starting your XPath query with //, you're starting the search at the root of the entire document rather than from the node on which you call .SelectSingleNode(), so the same node - under the first dataset - is found in both calls.

    Simply use a relative path to avoid this problem, i.e. omit //:

    # ...
    
    # Note that "//" has been removed from the paths.
    Write-Host ("DataSets [0] is: " + $DataSets[0].SelectSingleNode("ns1:TimeStamp/ns2:Time", $nsmgr).InnerText)
    Write-Host ("DataSets [1] is: " + $DataSets[1].SelectSingleNode("ns1:TimeStamp/ns2:Time", $nsmgr).InnerText)
    
    # ...
    

    [1] In essence, PowerShell allows you to treat any parsed [xml] document as an object graph that you can drill into using regular dot notation, because PowerShell surfaces XML (child) elements and XML attributes as namespace-less properties on each object (XML node) in the graph.
    See the third section of this answer for details.