powershellnewlineline-breaks

How to split a string containing newlines


A string (extracted from an Outlook email message body.innerText) contains embedded newlines. How can I split this into an array of strings?

I would expect this example string to be split into an array of two (2) items. Instead, it becomes an array of three (3) items with a blank line in the middle.

PS C:\src\t> ("This is`r`na string.".Split([Environment]::NewLine)) | % { $_ }
This is

a string.
PS C:\src\t> "This is `r`na string.".Split([Environment]::NewLine) | Out-String | Format-Hex

           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   54 68 69 73 20 69 73 20 0D 0A 0D 0A 61 20 73 74  This is ....a st
00000010   72 69 6E 67 2E 0D 0A                             ring...

Solution

  • To treat a CRLF sequence as a whole as the separator, it's simpler to use the -split operator, which is regex-based:

    PS> "This is `r`n`r`n a string." -split '\r?\n'
    This is 
     a string.
    

    Note:


    As for what you tried:

    The separator argument, [Environment]::NewLine, on Windows is the string "`r`n", i.e. a CRLF sequence.

    This change in behavior happened outside of PowerShell's control: .NET (Core), the cross-platform successor to the legacy, Windows-only .NET Framework that PowerShell 7 is based on, introduced a new .Split() method overload with a [string]-typed separator parameter, which PowerShell's overload-resolution algorithm now selects over the older overload with the [char[]]-typed parameter when given a [string] instance.
    Avoiding such unavoidable (albeit rare) inadvertent behavioral changes is another good reason to prefer the PowerShell-native -split operator over the .NET [string] type's .Split() method.