regexpowershellfor-loopbatch-filecmd

For loop PowerShell command within a batch file removes apostrophe


I have a problem with a PowerShell command within a batch file for loop.

The script retrieves forenames and surnames from either side of a period in the local part of email addresses.

For example joe and bloggs from joe.bloggs@etc.com.

The script will retrieve those email addresses from within the content of a text file.

It works perfectly until a variable contains an apostrophe, (usually surname [family name]).

for /f "usebackq delims=" %%m in (`powershell -command "[CultureInfo]::CurrentCulture.TextInfo.ToTitleCase('!family1!')"`) do set "family=%%m"

From web searching for hours I realise the backticks are escaping the apostrophe inside !family1!, preventing the setting of resultant variable, 'family', (which should have been capitalised by the PS command).

I have tried running the command without backticks using double quotes and various things. I have tried escaping the apostrophe in the variable with ('\!family1!\'), caret(s) but no luck.

I have a feeling this is straight forward and I just can't see it, or find the right post.

I have echoed the variable before it gets to this line of code an know that the apostrophe is contained.

I have also in a CMD window set a variable with apostrophe and echoed it back fine so I know at least, its definitley being lost at the powershell line.

The problem, (it seems), is the backticks escaping the apostrophe and breaking the variable so the first letter of string is not capitalised and set as a new variable.

This is how it works with no apostrophe

The variable is e.g. smith inside !family1!, the powershell command turns into Smith (capitalised first letter), and is then set as a new variable for later on; a surname field in a CSV file.

When there is an apostrophe, e.g. O'Donald, its an empty variable and is not first letter capitalised, not set as a new variable and the CSV surname field is blank.

This scripts worked perfectly for weeks until I got a surname with the apostrophe.

So I just need that line of code adjusting slightly so the back ticks dont break the variable with an apostrophe in it.

Here is the full script for context:

@echo off
setlocal enabledelayedexpansion
set csvFile=Users.csv
echo givenName,familyName,email>%csvFile%
set INPUT_FILE=email.txt
set "REGEXP=[\.A-Z0-9\-_][\.A-Z0-9\-_]*@[\.A-Z\-_][\.A-Z\-_]*"
for /f "tokens=*" %%a in (%INPUT_FILE%) do for %%b in (%%a) do (
    for /f %%z in ('echo %%b ^| findstr /R /I "%REGEXP%"') do (
        set emails=%%z
        for /f "tokens=1 delims=." %%a in ("!emails!") do (set given1=%%a)
        for /f "tokens=2 delims=.;@" %%i in ("!emails!") do (set family1=%%i)
        for /f "usebackq delims=" %%l in (`powershell -command "[CultureInfo]::CurrentCulture.TextInfo.ToTitleCase('!given1!')"`) do set "given=%%l"
        for /f "usebackq delims=" %%m in (`powershell -command "[CultureInfo]::CurrentCulture.TextInfo.ToTitleCase('!family1!')"`) do set "family=%%m"
        for /f "usebackq delims=" %%n in (`powershell -command "$input='!family!'; $output=$input -replace '\d',''; $output"`) do set "outputString=%%n"
        echo !given!,!outputString!,!emails!>>%csvFile%
    )
)

Thanks in advance for any ideas.


Solution

  • Your own workaround - using embedded \"...\" quoting (i.e. escaped "..." quoting) in lieu of '...' - is effective, but at least hypothetically can have side effects: "..." strings in PowerShell are expandable strings, i.e. their content is subject to (potentially unwanted) string interpolation.

    See below for solutions that avoid this problem.


    I realise the backticks are escaping the apostrophe inside !family1!

    No, there's no escaping happening and in fact it is the absence of escaping them that is your problem:

    !family1 is interpolated up front by cmd.exe, so that a verbatim value such as o'malley ends up as ...('o'malley')... in your PowerShell command line, which causes a syntax error, because in order to embed ' in a '...' string in PowerShell (a verbatim string), it must be doubled (...('o''malley')...

    You can perform this escaping using a cmd.exe variable substitution (see help set):
    !family1:'=''!

    Therefore:

    for /f "usebackq delims=" %%m in (`powershell -command "[CultureInfo]::CurrentCulture.TextInfo.ToTitleCase('!family1:'=''!')"`) do set "family=%%m"
    

    However, a simpler alternative is to let the PowerShell code reference cmd.exe-side variables as an environment variables (all cmd.exe variables also become environment variables, unlike in PowerShell), namely as $env:family1 in this case:

    Therefore:

    for /f "usebackq delims=" %%m in (`powershell -command "[CultureInfo]::CurrentCulture.TextInfo.ToTitleCase($env:family1)"`) do set "family=%%m"