I need to use Python to redact a variable-length key from a URL string. The key is the value of a parameter named key
.
All but the last four characters of the key are to be redacted. The last four characters of the key are to intentionally remain unredacted for identification purposes. The character set of the key is ASCII alphanumeric. The URL must otherwise remain unaffected. The character used for redaction (█
) is unicodedata.lookup("FULL BLOCK")
.
Example input: https://example.com/data?bar=irish&key=dc3e966e4c57effb0cc7137dec7d39ac
.
Example output: https://example.com/data?bar=irish&key=████████████████████████████39ac
.
I am using Python 3.8. There exists a different question which deals with redacting a password at a different location in the URL and it doesn't help me.
I tried a simple regex substitution but it worked only with a fixed length key whereas I have a variable length key.
A flexible way to do this is using a regular expression substitution with a replacement function. The regex uses non-matching positive lookbehind and lookahead assertions.
import re
import unicodedata
_REGEX = re.compile(r"(?<=\Wkey=)(?P<redacted>\w+)(?=\w{4})")
_REPL_CHAR = unicodedata.lookup("FULL BLOCK")
def redact_key(url: str) -> str:
# Ref: https://stackoverflow.com/a/59971629/
return _REGEX.sub(lambda match: len(match.groupdict()["redacted"]) * _REPL_CHAR, url)
Test:
redact_key('https://example.com/data?bar=irish&key=dc3e966e4c57effb0cc7137dec7d39ac')
'https://example.com/data?bar=irish&key=████████████████████████████39ac'
>>> redact_key('https://example.com/data?key=dc3e966e4c57effb0cc7137dec7d39ac')
'https://example.com/data?key=████████████████████████████39ac'
>>> redact_key('https://example.com/data?bar=irish&key=dc3e966e4c57effb0cc7137dec7d39ac&baz=qux')
'https://example.com/data?bar=irish&key=████████████████████████████39ac&baz=qux'
>>> redact_key('https://example.com/data?bar=irish&baz=qux')
'https://example.com/data?bar=irish&baz=qux'