bashawksedxmllint

Find and replace XML tag and not it's value


I have an XML file that contains the following data:

<Extrinsic name="CommodityVendor">1234567</Extrinsic>
<Extrinsic name="buyerVatID">1122334455</Extrinsic>
<Extrinsic name="supplierVatID">S9876543</Extrinsic>

I would like the output to look like this (basically replace the <Extrinsic> tag for the Buyer VAT Id with <ExtrinsicB>):

<Extrinsic name="CommodityVendor">1234567</Extrinsic>
<ExtrinsicB name="buyerVatID">1122334455</ExtrinsicB>
<Extrinsic name="supplierVatID">S9876543</Extrinsic>

How can I do this with sed, awk or xmllint on Linux using bash?

Thanks


Solution

  • With awk or sed, you can only reach the representation of the file, which might change without changing the logical content it conveys, and thus trip up any approach with plaintext-only manipulation tools. xmllint can parse the logical structure of an XML document, but is geared towards querying or validating its structure or content, not towards manipulating it. For that, you need an XML processor that supports traversing to a node based on your criteria, and manipulate it according to your needs. The criteria for the traversal is still somewhat unclear from your example, though. Do you want to change the second node with that name, every other node regardless, or maybe the one whose name attribute has a specific value, ...?

    Assuming a valid input.xml, here are some examples following the specific attribute value approach:

    xmlstarlet

    xmlstarlet ed -r '//Extrinsic[@name="buyerVatID"]' -v ExtrinsicB input.xml
    

    Add the -L or --inplace flag to modify the file in-place.

    xidel

    xidel --xml --xquery '
      for $node in //Extrinsic return
      if ($node/@name eq "buyerVatID")
      then element ExtrinsicB {$node/(@*, node())}
      else $node
    ' input.xml
    

    Add the --in-place flag to modify the file in-place.