regexpowershellreplaceansi-escape

Powershell: Replace all occurrences of different substrings starting with same Unicode char (Regex?)


I have a string:

[33m[TEST][90m [93ma wonderful testorius line[90m ([37mbite me[90m) which ends here.

You are not able to see it (as stackoverflow will remove it when I post it) but there is a special Unicode char before every [xxm where xx is a variable number and [ as well as m are fixed. You can find the special char here: https://gist.githubusercontent.com/mlocati/fdabcaeb8071d5c75a2d51712db24011/raw/b710612d6320df7e146508094e84b92b34c77d48/win10colors.cmd

So, it is like this (the special char is displayed here with a $):

$[33m[TEST]$[90m $[93ma wonderful testorius line$[90m ($[37mbite me$[90m) which ends here.

Now, I want to remove all $[xxm substrings in this line as it is only for colored monitor output but should not be saved to a log file.

So the expected outcome should be:

[TEST] a wonderful testorius line (bite me) which ends here.

I tried to use RegEx but I dont understand it (perhaps it is extra confusing due to the special char and the open bracked) and I am not able to use wildcards in a normal .Replace ("this","with_that") operation.

How am I able to accomplish this?


Solution

  • In this simple case, the following -replace operation will do, but note that this is not sufficient to robustly remove all variations of ANSI / Virtual Terminal escape sequences:

    # Sample input.
    # Note: `e is used as a placeholder for ESC and replaced with actual ESC chars. 
    #       ([char] 0x1b)
    #       In PowerShell (Core) 7+, "..." strings directly understand `e as ESC.
    $formattedStr = 
     '`e[33m[TEST]`e[90m `e[93ma...`e[90m (`e[37m...`e[90m) ends here.' `
       -replace '`e', [char] 0x1b
    
    # Remove simple color ANSI escape sequences.
    # -> '[TEST] a wonderful testorius line (bite me) which ends here.'
    # \e is a *regex* escape sequence that expands to an ESC char.
    $formattedStr -replace '\e\[\d*m'
    

    Generally speaking, it's advisable to look for options on programs producing such for-display-formatted strings to make them output plain-text strings instead, so that the need to strip escape sequences after the fact doesn't even arise.


    Robust PowerShell (Core) 7.2+ solution:

    The new System.Management.Automation.Internal.StringDecorated class can store string with ANSI/VT sequences and convert them to plain-text strings on demand by calling .ToString('PlainText') on them:

    # PowerShell 7.2+ only
    
    $formattedStr = "`e[33m[TEST]`e[90m `e[93ma wonderful testorius line`e[90m (`e[37mbite me`e[90m) which ends here."
    
    # -> '[TEST] a wonderful testorius line (bite me) which ends here.'
    ([System.Management.Automation.Internal.StringDecorated] $formattedStr).
      ToString('PlainText')