powershellutf-8

PowerShell : Set-Content Replace word and Encoding UTF8 without BOM


I'd like to escape \ to \\ in csv file to upload to Redshift. Following simple PowerShell script can replace $TargetWord \ to $ReplaceWord \\ , as expected, but export utf-8 with bom and sometimes causes the Redshift copy error.

Any advice would be appreciated to improve it. Thank you in advance.

Exp_Escape.ps1

Param(
    [string]$StrExpFile,
    [string]$TargetWord,
    [string]$ReplaceWord
)

# $(Get-Content "$StrExpFile").replace($TargetWord,$ReplaceWord) | Set-Content -Encoding UTF8 "$StrExpFile"

Solution

  • The workaround requires combining Out-String with New-Item, which (curiously) creates BOM-less UTF-8 files by default even in Windows PowerShell:

    Param(
        [string]$StrExpFile,
        [string]$TargetWord,
        [string]$ReplaceWord
    )
    
    $null = 
      New-Item -Force $StrExpFile -Value (
        (Get-Content $StrExpFile).Replace($TargetWord, $ReplaceWord) | Out-String
      )
    

    Note:

    Caveats:

    For a convenience wrapper function around Out-File for use in Windows PowerShell that creates BOM-less UTF-8 files in streaming fashion, see this answer.


    Alternative, with direct use of .NET APIs:

    .NET APIs produce BOM-less UTF-8 files by default.
    However, because .NET's working directory usually differs from PowerShell's, full file paths must always be used, which requires more effort:

    # In order for .NET API calls to work as expected,
    # file paths must be expressed as *full, native* paths.
    $OutDir = Split-Path -Parent $StrExpFile
    if ($OutDir -eq '') { $OutDir = '.' }
    $strExpFileFullPath = Join-Path (Convert-Path $OutDir) (Split-Path -Leaf $StrExpFile)
    
    # Note: .NET APIs create BOM-less UTF-8 files *by default*
    [IO.File]::WriteAllLines(
      $strExpFileFullPath,
      (Get-Content $StrExpFile).Replace($TargetWord, $ReplaceWord)
    )
    

    The above uses the System.IO.File.WriteAllLines method.


    [1] Note that Out-String automatically appends a trailing newline to the string it outputs, which is actually desirable here (to ensure that the file ends with a newline, which New-Item itself doesn't do); however, in general this behavior is problematic, as discussed in GitHub issue #14444.

    [2] Note that while New-Item technically supports receiving the content to write to the file via the pipeline, it unfortunately writes each input object to the target file alone, successively, with only the last one ending up in the file.