xmlpowershellxmldom

Extract an element from an XML file by position in the hierarchy rather than by name


I have an XML file like this :

<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

I have Powershell script like this :

$xmlData = New-Object -TypeName System.Xml.XmlDocument
$xmlData.Load('c:\test\data.xml')
$xmlData.note.body # I want to remove "note.body" to change to use function

Can I get the value of what is currently element note.body without having to use the element names, i.e., can I extract values by the target element's position in the document hierarchy?

The idea is to have a script that continues to work even after element names in the input XML change (but not the document's structure).


Solution

  • If you want to locate the element of interest positionally, use the generic XML DOM properties:

    In PowerShell Core:

    # Extract the text from the *last child* of the *document element*.
    # This is the positional equivalent of your $xmlData.note.body call.
    # Of course, you can use specific indices such as [2] as well.
    $xmlData.DocumentElement.ChildNodes[-1].InnerText
    

    With your sample document, the output is Don't forget me this weekend!, as expected.


    In Windows PowerShell (all workarounds work in PowerShell Core too):

    A bug prevents the use of [-1] to refer to the last element of the collection in this case.

    Workaround 1:

    $childNodes = $xmlData.DocumentElement.ChildNodes  
    $childNodes[$childNodes.Count-1].InnerText
    

    Workaround 2:

    You've proposed the following alternative, which is much simpler, albeit less efficient (which probably won't matter):

    Use member-access enumeration to extract the .InnerText values from all child nodes up front - which returns a regular PowerShell array - and apply [-1] to that:

    $xmlData.DocumentElement.ChildNodes.InnerText[-1]
    

    Workaround 3, proposed by Tomalak:

    $xmlData.DocumentElement.ChildNodes |
      Select-Object -Last 1 -ExpandProperty InnerText
    

    Select-Object -Last 1 does succeed in extracting the last child element, and -ExpandProperty InnerText then returns the .InnerText property value.

    Note that this solution will typically perform worst among the workarounds, due to use of a cmdlet in the pipeline, though, again, this likely won't matter in practice, unless you call this code in a loop with a high iteration count.