regexpowershellredaction

Redacting capture group values


Using REGEX to find patterns in a capture group; now I need to replace/redact the values found.

trying to replace values in a fixed length field:
REGEX to search: (\d{10})(.{20}) (.+).

The string is:

01234567890Alice Stone          3978 Smith st...

I have to replace capture group 2 (full name) with X's (or better yet just the first and last name in the capture group 2)

Regex: (\d{10})(.{20})(.+)

replace value $1xxxxxxxxxxxxxxxxxxxx$3

This works, but thought there would be a more glamorous solution (Maybe like $1 x{20} $3) or even better somehow just redact values with letters in it.

Thanks!


Solution

  • In order to formulate a replacement string whose length should match a - potentially variable-length - substring of the input string, you need to calculate the replacement string dynamically, via a script block (delegate).

    In PowerShell Core you can now pass a script block directly as the -replace operator's replacement operand:

    PS> '01234567890Alice Stone          3978 Smith st...' -replace 
          '(?<=^\d{10}).{20}', { 'x' * $_.Value.Length }
    
    0123456789xxxxxxxxxxxxxxxxxxxx  3978 Smith st...
    

    In Windows PowerShell you have to use the [regex] type directly:

    PS> [regex]::Replace('01234567890Alice Stone          3978 Smith st...',
          '(?<=^\d{10}).{20}', { param($m) 'x' * $m.Value.Length })
    
    0123456789xxxxxxxxxxxxxxxxxxxx  3978 Smith st...
    

    If the length of the substring to replace is known in advance - as in your case - you could more simply do:

    
    PS> $len = 20; '01234567890Alice Stone          3978 Smith st...' -replace 
          "(?<=^\d{10}).{$len}", ('x' * $len)
    
    0123456789xxxxxxxxxxxxxxxxxxxx  3978 Smith st...
    

    Unconditionally redacting all letters is even simpler:

    PS> '01234567890Alice Stone          3978 Smith st...' -replace '\p{L}', 'x'
    
    01234567890xxxxx xxxxx          3978 xxxxx xx...
    

    \p{L} matches any Unicode letter.


    Redacting the letters only in the matching substring requires nesting a -replace operation:

    PS> '01234567890Alice Stone          3978 Smith st...' -replace 
          '(?<=^\d{10}).{20}', { $_ -replace '\p{L}', 'x' }
    
    01234567890xxxxx xxxxx          3978 Smith st...