powershellbatch-file

How to convert specific symbols to Unicode from BATCH/Powershell


I have code for BATCH

@echo off
setlocal enabledelayedexpansion
for /d %%a in ("%cd%") do set "directory_name=%%~nxa"
powershell -Command "$content = [IO.File]::ReadAllText('file.txt'); $content = $content -replace '\s\u003Cmod((?:.|\n)).*%directory_name: =_%_v((?:.|\n))*?\u003C\/mod\u003E', ''; [IO.File]::WriteAllText('file.txt', $content, [System.Text.Encoding]::UTF8)" 
endlocal
pause

It's worked fine and did all what i want. Until i got folder name blabla_-_blabla's_bla. So, the apostrophe now is breaking the syntax of powershell and it cant be completed. But if I try to change the apostrophe symbol to a unicode \u0027, it works fine.

Any idea how to convert via BATCH any specific symbols to unicode (exclude latin, "_" and "-") before input to powershell code via %directory_name: =_% ?


Solution

  • Preface:


    To avoid any quoting and escaping headaches, make PowerShell reference variables set in batch files as environment variables (all variables set in batch files are invariably also environment variables, and therefore seen by child processes).
    That is, rather than using cmd.exe's up-front string interpolation via references such as %directory_name% embedded in the command passed to -Command, make the command reference the value of the batch-file variable directory_name as follows, using PowerShell's syntax for accessing environment variables:

    $env:directory_name
    

    In your case, this also makes it easier to apply [regex]::Escape() to the variable value, which is needed to ensure that the value is treated literally inside the regex that you're passing to -replace (note that $env:directory_name -replace ' ', '_' is the deferred equivalent of replacing spaces with _ via up-front string interpolation by cmd.exe, %directory_name: =_%):

    [regex]::Escape(($env:directory_name -replace ' ', '_'))
    

    Since you're using '...' (single-quoting) for the regex passed to the $content -replace ... operation, you must splice the result of the above expression into that string via string concatenation (+), and enclose the operation in (...) to clarify operator precedence:

    $content -replace ('\s\u003Cmod((?:.|\n)).*' + [regex]::Escape(($env:directory_name -replace ' ', '_')) + '_v((?:.|\n))*?\u003C\/mod\u003E'), ''
    

    To put it all together, using a simplified version of your batch file:

    @echo off
    setlocal
    
    :: Get the name of the current directory.
    for %%a in (.) do set "directory_name=%%~nxa"
    
    :: Invoke PowerShell and make it obtain the value of %directory_name% 
    :: via $env:directory_name
    powershell -Command "$content = [IO.File]::ReadAllText('file.txt'); $content = $content -replace ('\s\u003Cmod((?:.|\n)).*' + [regex]::Escape(($env:directory_name -replace ' ', '_')) + '_v((?:.|\n))*?\u003C\/mod\u003E'), ''; [IO.File]::WriteAllText('file.txt', $content, [System.Text.Encoding]::UTF8)" 
    
    pause
    

    General batch-file character-encoding caveat (since the title mentions Unicode):