Using REGEX to find patterns in a capture group; now I need to replace/redact the values found.
trying to replace values in a fixed length field:
REGEX to search: (\d{10})(.{20}) (.+)
.
The string is:
01234567890Alice Stone 3978 Smith st...
I have to replace capture group 2 (full name) with X's (or better yet just the first and last name in the capture group 2)
Regex: (\d{10})(.{20})(.+)
replace value $1xxxxxxxxxxxxxxxxxxxx$3
This works, but thought there would be a more glamorous solution (Maybe like $1 x{20} $3
) or even better somehow just redact values with letters in it.
Thanks!
In order to formulate a replacement string whose length should match a - potentially variable-length - substring of the input string, you need to calculate the replacement string dynamically, via a script block (delegate).
In PowerShell Core you can now pass a script block directly as the -replace
operator's replacement operand:
PS> '01234567890Alice Stone 3978 Smith st...' -replace
'(?<=^\d{10}).{20}', { 'x' * $_.Value.Length }
0123456789xxxxxxxxxxxxxxxxxxxx 3978 Smith st...
'(?<=^\d{10}
is a positive look-behind assertion that matches the first 10 digits without capturing them, and .{20}
matches and captures the next 20 characters.
The script block is called for each match with $_
containing the match at hand as a [System.Text.RegularExpressions.Match]
instance; .Value
contains the matched text.
Thus, 'x' * $_.Value.Length
returns a string of x
chars. of the same length as the match.
In Windows PowerShell you have to use the [regex]
type directly:
PS> [regex]::Replace('01234567890Alice Stone 3978 Smith st...',
'(?<=^\d{10}).{20}', { param($m) 'x' * $m.Value.Length })
0123456789xxxxxxxxxxxxxxxxxxxx 3978 Smith st...
If the length of the substring to replace is known in advance - as in your case - you could more simply do:
PS> $len = 20; '01234567890Alice Stone 3978 Smith st...' -replace
"(?<=^\d{10}).{$len}", ('x' * $len)
0123456789xxxxxxxxxxxxxxxxxxxx 3978 Smith st...
Unconditionally redacting all letters is even simpler:
PS> '01234567890Alice Stone 3978 Smith st...' -replace '\p{L}', 'x'
01234567890xxxxx xxxxx 3978 xxxxx xx...
\p{L}
matches any Unicode letter.
Redacting the letters only in the matching substring requires nesting a -replace
operation:
PS> '01234567890Alice Stone 3978 Smith st...' -replace
'(?<=^\d{10}).{20}', { $_ -replace '\p{L}', 'x' }
01234567890xxxxx xxxxx 3978 Smith st...