pythondictionaryreplace

How can I replace multiple items in a string using a dictionary when the matched items require anchors?


Given this dictionary and input strings:

d = { 'one': '1',
      'two': '2',
      'three': '3'
}

s1 = 'String containing <#one|> or <#two|> numbers. <#one|>, <#two|>, <#three|>'
s2 = 'Only replace items which are anchored.  <#one|> is replaced, but not this one.'

How can I replace each occurrence of an anchored item <# |> using the dictionary d? The above strings should produce the output:

String containing 1 or 2 numbers.  1, 2, 3
Only replace items which are anchored.  1 is replaced, but not this one.

Using single pass multi replacement described here comes close to solving this, but doesn't handle the anchors.


Solution

  • The regular expression pattern should be <#(\w+)\|?>. The \|? part makes the trailing | optional because the quantifier ? (Zero or One) matches zero or one occurrence of the preceding element.

    I used the re.sub(pattern, repl, string) function for substituting substrings that match the pattern with a replacement function. For each match found, the replacement function will be called, and its return value will be used as the replacement.

    import re
    
    d = {
        'one': '1',
        'two': '2',
        'three': '3'
    }
    
    def replace_items(text):
        pattern = r"<#(\w+)\|?>"
        
        def replacement(match):  # This function will be called for each match
            key = match.group(1)
            return d.get(key, match.group(0))  # Return value if key exists, otherwise return original match
    
        return re.sub(pattern, replacement, text)
    
    
    s1 = 'String containing <#one|> or <#two|> numbers. <#one|>, <#two|>, <#three>'
    s2 = 'Only replace items which are anchored.  <#one|> is replaced, but not this one.'
    s3 = 'Number <#four> is not in the dictionary'
    
    print(replace_items(s1))
    print(replace_items(s2))
    print(replace_items(s3))
    

    Output

    String containing 1 or 2 numbers. 1, 2, 3
    Only replace items which are anchored.  1 is replaced, but not this one.
    Number <#four> is not in the dictionary