xmlpowershellcharacter-encodingxml-parsingxml-declaration

Why is the XML created using powershell scripting not in the right format?


I'm executing a PS script to read the contents of an xml, update few tag values and store contents into multiple xml files. I'm able to achieve all this but the xml files created are not getting read properly by the messaging queue to which it is passed. BUT the same xml file works in the queue when I open it and click save without making any changes to the data. I compared the 2 files 1 - after it is created and 2 - after I open the same and click save and they are identical! I cannot for the life of me figure out what is going wrong and how to fix it.

How to create an output xml file in a readable format? Not sure what changes when I click 'Save' on the xml files. Please help.

input CASH.XML:

<?xml version="1.0" encoding="UTF-8"?>
<ns:POSTransaction xmlns:ns="http://schema.xyz.com/Commerce/Customer/Transaction/v1">
<ns:tranHeader>
<ns:transactionId>96846836238236142669</ns:transactionId>
<ns:businessDateTime>2021-12-25T01:10:00</ns:businessDateTime>
<ns:emailId>Perftesting002@ymail.com</ns:emailId>
</ns:tranHeader>
</ns:POSTransaction>

PS:

$log="H:\logs.txt"
[xml]$loadXML = Get-Content "H:\Q_This\CASH.XML"

try
{
   $tranID = $loadXML.POSTransaction.tranHeader.transactionId.substring(17,3)
   $tranIntID = [int]$tranID   
   $tranc = $loadXML.POSTransaction.tranHeader.transactionId.substring(0,17)    
   $uname = $loadXML.POSTransaction.tranHeader.emailId.substring(0,11)
   $mailcnt = [int]$loadXML.POSTransaction.tranHeader.emailId.substring(11,3)
   $mailend = $loadXML.POSTransaction.tranHeader.emailId.Split("@")[1]

   for ($mailcnt; $mailcnt -lt 10; $mailcnt++)
   {    
        for ([int]$i =1; $i -le 5; $i++)
        {
        $mailupd = ([string]($mailcnt+1)).PadLeft(3,'0')
        $tranIntID = $tranIntID+1
        $loadXML.POSTransaction.tranHeader.transactionId = $tranc+[string]$tranIntID
        $loadXML.POSTransaction.tranHeader.emailId = $uname+$mailupd+'@'+$mailend
        $fileName = "CASH_"+$tranIntID+"_"+$mailupd+".XML"
        $loadXML.Save("H:\Q_This\"+$fileName)
        }
   }
}
catch
{
    Write-Host $_.Exception.Message
    Add-content $log -value ([string](Get-Date) + ' ' +$_.Exception.Message)    
}

The above code created 40 output xml files: 5 transaction files for each emailID from Performancetest003-010@ymail.com. However none of it was recognised by the messaging queue until I opened and clicked save (with no data change).


Solution

  • XML APIs have support for character encoding bult in, and if a given XML document's declaration specifies an encoding explicitly in its XML declaration (e.g. <?xml version="1.0" encoding="utf-8"?> ), that encoding is respected both on reading from and writing to files.

    Therefore, the robust way to read and write XML files is to use a dedicated XML API - the [xml] (System.Xml.XmlDocument) type's .Load() and .Save() methods in this case - rather than plain-text processing cmdlets such as Get-Content and Set-Content / Out-File.

    Caveat:

    Your later feedback indicates that you're looking for ANSI-encoded output XML files, i.e. that your goal is to transcode the input XML from UTF-8 to ANSI.

    The following is a simplified, self-contained example of such transcoding. It assumes that your system's active ANSI code page is Windows-1252.

    # In- and output files.
    # IMPORTANT:
    #   Always use *full, file-system-native paths* when calling .NET methods.
    $inFile =   Join-Path $PWD.ProviderPath in.xml
    $outFile =  Join-Path $PWD.ProviderPath out.xml
    
    # Create a UTF-8-encoded sample input file,
    # for simplicity with plain-text processing.
    # Note the non-ASCII character in the element text ('ä')
    '<?xml version="1.0" encoding="utf-8"?><foo>bär</foo>' | Set-Content -Encoding utf8 $inFile
    
    # Read the file using the XML-processing API provided via the [xml] type.
    $xml = [xml]::new()
    $xml.Load($inFile)
    
    # Now change the character-encoding attribute to the desired new encoding.
    # An XML declaration - if present - is always the *first child node* 
    # of the [xml] instance.
    $xml.ChildNodes[0].encoding = 'windows-1252'
    
    # Save the document.
    # The .Save() method will automatically respect the specified encoding.
    $xml.Save($outFile)
    

    To verify that the output file was correctly Windows-1252-encoded, use the following command:

    # PowerShell (Core) defaults to UTF-8 in the absence of a BOM.
    Get-Content -Encoding 1252 $outFile
    
    # Windows PowerShell *defaults* to the 
    # system's active ANSI code page in the absence of a BOM.
    Get-Content $outFile
    

    You should see the following output - note the correct rendering of the non-ASCII character, ä:

    <?xml version="1.0" encoding="windows-1252"?>
    <foo>bär</foo>
    

    Note: