I need to remove BOM using cmd file from MyFile.txt. The file is located here.
| Out-File -encoding utf8 '%CD%\MyFile.txt'
I need it to be removed using only one cmd, that is, in the next lines. I need it to be backwards compatible to Windows 7. If I needed it for myself only I would just use -encoding default
, but its not backwards compatible even to win 10. Just one file. There were many different questions about BOM on different situations, my issue is I need to use one .cmd
, I already have utf8 with BOM and I need it without BOM. Please help me.
I was trying to use powershell but the issue is powershell syntax is not really compatible to cmd? It just says about unrecognised syntax everytime I tried anything from the very popular theme like this Using PowerShell to write a file in UTF-8 without the BOM.
PowerShell is indeed your best bet, and while you cannot directly use PowerShell commands from cmd.exe
/ a batch file, you can pass them to powershell.exe
, the Windows PowerShell CLI (the solutions below also work on the no longer supported Windows 7 edition of Windows, as requested).
Here's a sample batch file that demonstrates a solution:
@echo off & setlocal
:: Specify the input file, assumed to be a UTF-8 with-BOM file.
set "targetFile=%CD%\MyFile.txt"
:: Call Windows PowerShell in order to
:: convert the file to a *BOM-less** UTF-8 file.
powershell -noprofile -c $null = New-Item $env:targetFile -Force -Value (Get-Content -Raw -LiteralPath $env:targetFile)
Note:
The PowerShell code takes advantage of the fact that New-Item
, when given a -Value
argument, creates a BOM-less UTF-8 file, even in Windows PowerShell (the legacy, ships-with-Windows, Windows-only edition of PowerShell whose latest and last version is 5.1) - see this answer for details.
Caveat: By reading the entire file into memory first, with Get-Content
-Raw
, and rewriting it in full, in place - with BOM-less UTF-8 encoding - there is a hypothetical risk of data loss, however unlikely: if rewriting the file's content gets interrupted, say due to a power outage, data loss may occur.
To eliminate this risk, you'd have to write to a temporary file first, and then, once the temporary file was successfully written, replace the original file with it.
Similarly, you'd need a temporary file if the input file is too large to fit into memory as a whole (which is not typical for text files); in that case, you can combine [IO.File]::ReadLines()
with [IO.File]::WriteAllLines()
to read and write line by line.
The following solution addresses these problems:
It uses a temporary file and reads and writes lines one at a time.
It is therefore a robust solution that is memory-friendly, albeit at the expense of implementation complexity and speed (though the latter typically won't matter)
@echo off & setlocal
set "targetFile=%CD%\MyFile.txt"
powershell -noprofile -c $ErrorActionPreference='Stop'; $tempFile=New-TemporaryFile; $inFile=Convert-Path -LiteralPath $env:targetFile; [IO.File]::WriteAllLines($tempFile, [IO.File]::ReadLines($inFile)); $tempFile ^| Move-Item -Force -Destination $inFile
Note:
Passing multiple statements to powershell.exe
gets unwieldy, as you must either pass them all on a single line or use cmd.exe
's line-continuations; here's a readable reformulation that you could use if you placed the code in a PowerShell script file (*.ps1
) and then called powershell -noprofile -file yourScript.ps1
(though if you took that approach, it'd be worth generalizing the code to accept the input file path as an argument):
# PowerShell code for use in a *.ps1 script file.
$ErrorActionPreference = 'Stop'
$tempFile = New-TemporaryFile
$inFile = Convert-Path -LiteralPath $env:targetFile
[IO.File]::WriteAllLines(
$tempFile,
[IO.File]::ReadLines($inFile)
)
$tempFile | Move-Item -Force -Destination $inFile
Note that while the use of [IO.File]::WriteAllLines()
and [IO.File]::ReadLines()
without specifying a character encoding may look like a no-op in terms of BOM removal, it does work as intended: .NET recognizes a UTF-8 file with BOM on reading, and, on writing, writes a UTF-8 file without BOM by default.