I'd like to escape \ to \\ in csv file to upload to Redshift. Following simple PowerShell script can replace $TargetWord \ to $ReplaceWord \\ , as expected, but export utf-8 with bom and sometimes causes the Redshift copy error.
Any advice would be appreciated to improve it. Thank you in advance.
Param(
[string]$StrExpFile,
[string]$TargetWord,
[string]$ReplaceWord
)
# $(Get-Content "$StrExpFile").replace($TargetWord,$ReplaceWord) | Set-Content -Encoding UTF8 "$StrExpFile"
In PowerShell (Core) 7, you get BOM-less UTF-8 files by default; -Encoding utf8
and -Encoding utf8NoBom
express that default explicitly; to excplicitly request a BOM, use -Encoding utf8BOM
.
In Windows PowerShell, unfortunately, you must use a workaround to get BOM-less UTF-8, because -Encoding utf8
only produces UTF-8 files with BOM (and no other utf8
-related values are supported).
The workaround requires combining Out-String
with New-Item
, which (curiously) creates BOM-less UTF-8 files by default even in Windows PowerShell:
Param(
[string]$StrExpFile,
[string]$TargetWord,
[string]$ReplaceWord
)
$null =
New-Item -Force $StrExpFile -Value (
(Get-Content $StrExpFile).Replace($TargetWord, $ReplaceWord) | Out-String
)
Note:
$null =
is needed to discard the output object that New-Item
emits (which is a file-info object describing the newly created file).
-Force
is needed in order to quietly overwrite an existing file by the same name (as Set-Content
and Out-File
do by default).
The -Value
argument must be a single (multi-line) string to write to the file, which is what Out-String
ensures.[1]
Caveats:
For non-string input objects, Out-String
creates the same rich for-display representations as Out-File
and as you would see in the console by default.
New-Item
itself does not append a trailing newline when it writes the string to the file, but Out-String
curiously does; while this happens to be handy here, it is generally problematic, as discussed in GitHub issue #14444.
The alternative to using Out-String
is to create the multi-line string manually, which is a bit more cumbersome ("`n"
is used to create LF-only newlines, which PowerShell and most programs happily accept even on Windows; for platform-native newlines (CRLF) on Windows, use [Environment]::NewLine
instead):
$null =
New-Item -Force $StrExpFile -Value (
((Get-Content $StrExpFile).Replace($TargetWord, $ReplaceWord) -join "`n`") + "`n"
)
Since the entire file content must be passed as an argument,[2] it must fit into memory as a whole; the convenience function discussed next avoids this problem.
For a convenience wrapper function around Out-File
for use in Windows PowerShell that creates BOM-less UTF-8 files in streaming fashion, see this answer.
Alternative, with direct use of .NET APIs:
.NET APIs produce BOM-less UTF-8 files by default.
However, because .NET's working directory usually differs from PowerShell's, full file paths must always be used, which requires more effort:
# In order for .NET API calls to work as expected,
# file paths must be expressed as *full, native* paths.
$OutDir = Split-Path -Parent $StrExpFile
if ($OutDir -eq '') { $OutDir = '.' }
$strExpFileFullPath = Join-Path (Convert-Path $OutDir) (Split-Path -Leaf $StrExpFile)
# Note: .NET APIs create BOM-less UTF-8 files *by default*
[IO.File]::WriteAllLines(
$strExpFileFullPath,
(Get-Content $StrExpFile).Replace($TargetWord, $ReplaceWord)
)
The above uses the System.IO.File.WriteAllLines
method.
[1] Note that Out-String
automatically appends a trailing newline to the string it outputs, which is actually desirable here (to ensure that the file ends with a newline, which New-Item
itself doesn't do); however, in general this behavior is problematic, as discussed in GitHub issue #14444.
[2] Note that while New-Item
technically supports receiving the content to write to the file via the pipeline, it unfortunately writes each input object to the target file alone, successively, with only the last one ending up in the file.