I have code for BATCH
@echo off
setlocal enabledelayedexpansion
for /d %%a in ("%cd%") do set "directory_name=%%~nxa"
powershell -Command "$content = [IO.File]::ReadAllText('file.txt'); $content = $content -replace '\s\u003Cmod((?:.|\n)).*%directory_name: =_%_v((?:.|\n))*?\u003C\/mod\u003E', ''; [IO.File]::WriteAllText('file.txt', $content, [System.Text.Encoding]::UTF8)"
endlocal
pause
It's worked fine and did all what i want. Until i got folder name blabla_-_blabla's_bla. So, the apostrophe now is breaking the syntax of powershell and it cant be completed. But if I try to change the apostrophe symbol to a unicode \u0027, it works fine.
Any idea how to convert via BATCH any specific symbols to unicode (exclude latin, "_" and "-") before input to powershell code via %directory_name: =_% ?
Preface:
The technique shown below robustly communicates a value stored in a batch-file variable to a PowerShell CLI call (using either powershell.exe
(Windows PowerShell) or pwsh.exe
for PowerShell (Core) 7), whatever characters it may contain, so that no advance knowledge about it is required (but see the character-encoding caveat in the bottom section).
Situationally, if you know that a value never contains "
but may contain '
, as in the case at hand, switching from using embedded '...'
quoting (single-quoting) to using embedded "..."
quoting (double-quoting) in combination with up-front string interpolation by cmd.exe
is an option, as JosefZ notes; however:
Not only do you need to be mindful of how the literal parts of such an embedded string may be affected by PowerShell's string interpolation in such expandable (interpolating) strings ("..."
), the up-front-expanded value itself could become subject to undesired interpretation, such as removal of `
characters and inadvertent expansion of $
-prefixed tokens.
Robustly passing an embedded "..."
string inside an overall "..."
string containing PowerShell code passed to -Command
from a batch file (cmd.exe
) is cumbersome, because potentially running afoul of cmd.exe
's parsing rules is possible. See this answer for an explanation and workarounds.
To avoid any quoting and escaping headaches, make PowerShell reference variables set in batch files as environment variables (all variables set in batch files are invariably also environment variables, and therefore seen by child processes).
That is, rather than using cmd.exe
's up-front string interpolation via references such as %directory_name%
embedded in the command passed to -Command
, make the command reference the value of the batch-file variable directory_name
as follows, using PowerShell's syntax for accessing environment variables:
$env:directory_name
In your case, this also makes it easier to apply [regex]::Escape()
to the variable value, which is needed to ensure that the value is treated literally inside the regex that you're passing to -replace
(note that $env:directory_name -replace ' ', '_'
is the deferred equivalent of replacing spaces with _
via up-front string interpolation by cmd.exe
, %directory_name: =_%
):
[regex]::Escape(($env:directory_name -replace ' ', '_'))
In the case at hand, where the value of interest is the name of the current working directory, you could also let PowerShell determine it:
[regex]::Escape(((Split-Path -Leaf $PWD) -replace ' ', '_'))
Since you're using '...'
(single-quoting) for the regex passed to the $content -replace ...
operation, you must splice the result of the above expression into that string via string concatenation (+
), and enclose the operation in (...)
to clarify operator precedence:
$content -replace ('\s\u003Cmod((?:.|\n)).*' + [regex]::Escape(($env:directory_name -replace ' ', '_')) + '_v((?:.|\n))*?\u003C\/mod\u003E'), ''
To put it all together, using a simplified version of your batch file:
@echo off
setlocal
:: Get the name of the current directory.
for %%a in (.) do set "directory_name=%%~nxa"
:: Invoke PowerShell and make it obtain the value of %directory_name%
:: via $env:directory_name
powershell -Command "$content = [IO.File]::ReadAllText('file.txt'); $content = $content -replace ('\s\u003Cmod((?:.|\n)).*' + [regex]::Escape(($env:directory_name -replace ' ', '_')) + '_v((?:.|\n))*?\u003C\/mod\u003E'), ''; [IO.File]::WriteAllText('file.txt', $content, [System.Text.Encoding]::UTF8)"
pause
General batch-file character-encoding caveat (since the title mentions Unicode):
To also support non-ASCII characters, be sure that your batch file's character encoding matches the active console code page, as reported by chcp.com
For full Unicode support, via UTF-8:
Save your batch files as UTF-8 without BOM, with CRLF newlines.[1]
Switch the console window's active code page to 65001
(UTF-8):
Note:
In Windows 10 and above, there is an option to switch to UTF-8 system-wide, persistently, but doing so has far-reaching consequences - see this answer.
Switching to UTF-8 ad hoc affects not just a given batch file, but other processes that may run later in the same console window too. If that is undesired, saving the original code page and restoring it later is necessary.
In cmd.exe
shell sessions, either run chcp 65001
before invoking your batch file or place it right after @echo off
in your batch file (as noted, this affects processes running later in the same console window too).
If you want to make all future cmd.exe
sessions default to UTF-8 (without making the aforementioned system-wide switch), you can use configure it to run chcp 65001
every time a cmd.exe
process is created:
reg.exe add "HKCU\Software\Microsoft\Command Processor" /v AutoRun /d "chcp 65001 >NUL"
Note the use of >NUL
to silence chcp.com
's output; remove it, if prefer a visual reminder that the command ran on startup.
The configuration is specific to the current user. If you have administrative rights, you may alternatively target the HKLM
hive instead of HKCU
, from an elevated session, so as to configure the behavior for all users.
In PowerShell sessions, use of chcp 65001
is not an option, because .NET caches encodings and isn't notified of the change.
Use the following magic incantation instead (this implicitly sets the code page to 65001
, while also making .NET aware of the change; see this answer for details):
$OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
If you want make all future PowerShell sessions default to UTF-8 (without making the aforementioned system-wide switch), you can add the above to your $PROFILE
file.
File $PROFILE
is specific to the current user and host program (typically, a console window). Alternatively, you can save the command in file $PROFILE.CurrentUserAllHosts
to target all hosts for the current user, and - assuming you have administrative privileges - in the analogous $PROFILE.AllUsersCurrentHost
and $PROFILE.AllUsersAllHosts
files that target all users, from an elevated session.
[1] UTF-8 with BOM is fine, as long as code page 65001
is already in effect before calling a batch file encoded this way (as opposed to trying to switch to 65001
from inside a batch file). Either way, however, when 65001
is in effect, UTF-8-encoded batch files that contain non-ASCII-range characters are only read correctly with CRLF newlines, which appears to be a bug, given that batch files with LF-only newlines work fine otherwise.