I want to use the redirect append >>
or write >
to write to a txt file, but when I do, I receive a weird format "\x00a\x00p..."
.
I successfully use Set-Content
and Add-Content
, why do they function as expected, but not the >>
and >
redirect operators?
Showing the output using PowerShell cat
as well as simple Python print.
rocket_brain> new-item test.txt
rocket_brain> "appended using add-content" | add-content test.txt
rocket_brain> cat test.txt
appended using add-content
but then if I use redirect append >>
rocket_brain> "appended using redirect" >> test.txt
rocket_brain> cat test.txt
appended using add-content
a p p e n d e d u s i n g r e d i r e c t
Simple Python script: read_test.py
with open("test.txt", "r") as file: # open test.txt in readmode
data = file.readlines() # append each line to the list data
print(data) # output list with each input line as an item
Using read_test.py I see a difference in formatting
rocket_brain> python read_test.txt
['appended using add-content\n', 'a\x00p\x00p\x00e\x00n\x00d\x00e\x00d\x00 \x00u\x00s\x00i\x00n\x00g\x00 \x00r\x00e\x00d\x00i\x00r\x00e\x00c\x00t\x00\r\x00\n', '\x00']
NOTE: If I use only the redirect append >>
(or write >
) without first using Add-Content
, the cat
output looks normal (instead of spaced out), but I will then get the /x00p
format for every line when using the Python script (including any Add-Content
command after starting with >
operators). Opening the file in Notepad (or VS etc), the text always looks as expected. Using >>
or >
in cmd
(instead of PS) also stores text in expected ascii format.
Related links: cmd redirection operators, PS redirection operators
Note: The problem is ultimately that in Windows PowerShell different cmdlets / operators use different default encodings. This problem has been resolved in PowerShell (Core) 7+, where BOM-less UTF-8 is consistently used.
>>
blindly applies Out-File
's default encoding when appending to an existing file (in effect, >
behaves like Out-File
and >>
like Out-File -Append
), which in Windows PowerShell is the encoding named Unicode
, i.e., UTF-16LE, where most characters are encoded as 2-byte sequences, even those in the ASCII range; the latter have a 0x0
(NUL
) as the high byte.
While Add-Content
, by contrast, does try to detect a file's existing encodingThanks again, js2010., you used it on an empty file, in which case Add-Content
uses the same default as Set-Content
, which in Windows PowerShell is the encoding named Default
, which refers to your system's active legacy ANSI code page.
Therefore, to match the single-byte ANSI encoding initially created by your Add-Content
call when appending further content, use Out-File -Append -Encoding Default
instead of >>
, or simply keep using Add-Content
.
Alternatively, pick a different encoding with Set-Content
/ Add-Content -Encoding ...
and match it in the Out-File -Append
call; UTF-8 is generally the best choice, though note that when you create a UTF-8 file in Windows PowerShell, it will start with a BOM, a (pseudo) byte-order mark identifying the file as UTF-8, in the form of 3 bytes at the start of the file, which Unix-like platforms typically do not expect.
See this answer for workarounds that create BOM-less UTF-8 files.
In Windows PowerShell v5.1 (the latest and last version), you may also change the default encoding globally, including for >
and >>
(which isn't possible in earlier versions) - use with caution, given that every call to a cmdlet that supports an -Encoding
parameter will then implicitly use the configured encoding. To change to UTF-8, for instance, use:
$PSDefaultParameterValues['*:Encoding']='UTF8'
.
To limit the change to Out-File
/ >
/ >>
only:
$PSDefaultParameterValues['Out-File:Encoding']='UTF8'
.
As noted, in Windows PowerShell this will create UTF-8 files with a BOM, and creating BOM-less UTF-8 files requires workarounds.
The above technique also works in PowerShell 7+, but given that BOM-less UTF-8 is the consistent default to begin with, this is probably not needed. (In the unlikely event that you want to create UTF-8 files with a BOM, use 'utf8BOM'
in the above assignment).
Aside from different default encodings (in Windows PowerShell), it is important to note that Set-Content
/ Add-Content
on the one hand and >
/ >>
/ Out-File [-Append]
on the other behave fundamentally differently with non-string input:
In short: the former apply simple .ToString()
-formatting to the input objects, whereas the latter perform the same output formatting you would see in the console - see this answer for details.
[1] Due to the initial content set by Add-Content
, Windows PowerShell interprets the file as ANSI-encoded (the default in the absence of a BOM), where each byte is its own character. The UTF-16 content appended later is therefore also interpreted as if it were ANSI, so the 0x0
bytes are treated like characters in their own right, which print to the console like spaces.