I found out that xml.save ends multi line comments with LF on windows instead of CR LF. Why is that? Here is my code
[xml]$myXml= @"
<myTag>
<!-- multi line comment
each line ends with CR LF
in my code, but with LF
after .save gets called
end of my multi line comment -->
</myTag>
"@
$myXml.save("C:\temp\my.xml")
Here is a screenshot of Notepad++ from my.ps1 to show that my code does not contain any LFs
Here is a screenshot of Notepad++ from my.xml to show the LFs
One way to fix this, which I found, is using XmlWriterSettings and XmlTextWriter:
$settings = New-Object System.Xml.XmlWriterSettings
$settings.NewLineChars = "`r`n"
$settings.Indent = $true
$writer = [System.Xml.XmlTextWriter]::Create($dependencyXmlPath, $settings)
$myXml.Save($writer)
$writer.Close()
Is this the most simple solution?
Instruct the [xml]
(System.Xml.XmlDocument
) instance to preserve insignificant whitespace before loading, by setting its PreserveWhitespace
property to $true
, which preserves the input newline format as well as the specific intra-line whitespace[1] (note how the indentation changed in your output file).
# Create an [xml] instance explicitly, so that its
# .PreserveWhitespace property can be set *before* loading content.
($myXml = [xml]::new()).PreserveWhitespace = $true
# Now load the XML text (parse it into a DOM).
$myXml.LoadXml(
@"
<myTag>
<!-- multi line comment
each line ends with CR LF
in my code, but with LF
after .save gets called
end of my multi line comment -->
</myTag>
"@
)
$myXml.Save("C:\temp\my.xml")
However, apart from preserving the indentation, the above only consistently results in Windows-format CRLF newlines if your .ps1
file uses them (which it does):
See:
.PreserveWhitespace
or, even with the latter, to ensure consistent use of a (possibly different) format.Generally - unless .PreserveWhitespace = $true
is in effect - the .Save()
method:
invariably uses the platform-native newline format, irrespective of the original input text's newline format...
... with one exception, which is the one you ran into:
Behind the scenes, the XmlWriterSettings
instance that the .Save()
method uses (when not passed an XmlWriter
instance explicitly) has its .NewLineHandling
property set to None
.
This results in newlines that are part of multiline comments, multiline text nodes and other potentially multiline constructs such as CDATA sections getting serialized with LF-only newlines - always, irrespective of the original newline format in the input (presumably because in the in-memory DOM all newlines are stored in LF-only format).
This behavior is certainly surprising, and arguably a bug, given that it's therefore easy to end up with a mix of CRLF and LF newlines on Windows, as in your case.
A workaround without .PreserveWhitespace = $true
and / or ensuring a consistent output newline format:
Note the two use cases:
You may need need or want to preserve insignificant whitespace from the input on reading and are only concerned with ensuring consistent use of the newline format of interest on writing.
You need .PreserveWhitespace = $true
but want to use a different newline format on writing.
You can control the output newline format by explicitly creating a XmlWriter
instance with with an XmlWriterSettings
instance with the following properties:
For pretty-printing - if desired - set .Indent = $true
, which uses 2 spaces per indentation level by default, overridable via .IndentChars
- this is what .Save()
does when given an output file path or stream.
Set .NewLineHandling = 'Replace'
to ensure consistent use of newlines (this is actually the default value, so it is curious that .Save()
in effect uses 'None'
).
.NewLineChars = "`r`n"
/ .NewLineChars = "`n"
for Windows-format CRLF / Unix-format LF-only newlines.# Using an [xml] cast means that insignificant whitespace
# is *not* preserved.
[xml] $myXml= @"
<myTag>
<!-- multi line comment
each line ends with CR LF
in my code, but with LF
after .save gets called
end of my multi line comment -->
</myTag>
"@
# Create an XML writer explicitly, with settings
# that
$writer = [System.Xml.XmlWriter]::Create(
"C:\temp\my.xml",
[System.Xml.XmlWriterSettings] @{
# Pretty-print, using the value of .IndentChars
# per indentation level; default is *two spaces*.
Indent = $true
# Replace all newlines in the DOM with the character(s)
# specified in the .NewLineChars property,
# which defaults to the platform-native format.
NewLineHandling = 'Replace'
}
)
# Save to the target file via the writer.
$myXml.Save($writer); $writer.Dispose()
# Returns $true if at least one LF-only newline is present.
(Get-Content -Raw $myXmlPath) -match '(?<!\r)\n'
[1] There is one exception: intra-tag whitespace is not preserved, i.e. the specific whitespace - including any newlines - that separates the element name from the first attribute as well as the whitespace between attributes isn't preserved - see this answer.