pythonregexstringreplacestring-substitution

Replace Multiple Patterns in a String (Markdown Text)


I've read a bunch of answers regarding this topic but I'm convinced mine is a bit different given multiple pattern search and replace.

Example:

names = {'1234': 'John Doe',
         '2345': 'Jane Smith',
         '3456': 'Marry Jones'
        }
        
message = '''![:Person](1234) ![:Person](2345) \nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in ![:Person](3456) voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'''

def markdown (msg):
    markdown_id = re.compile (r'(?<=\]\()\d+')  # Find 4 digit number
    result = re.sub (markdown_id, replace_name, msg)
    return result
    
def replace_name (matchobj):
    # print (matchobj)
    if matchobj.group (0) in names:
        return names [matchobj.group (0)]
        

markdown (message)
'![:Person](John Doe) ![:Person](Jane Smith) \nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in ![:Person](Marry Jones) voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

I would like to replace the entire string

'![:Person](1234) ![:Person](2345) \nLorem...'

with

'John Doe Jane Smith \nLorem...'

Solution

  • The solution is to use nested regex group by replacing

    markdown_id = re.compile (r'(?<=\]\()\d+')
    

    with

    markdown_id = re.compile(r'(!\[:\w+\]\((\d+)\))')
    

    where the first group being

    ![:Person](1234)
    

    and the second group is just the 4 digits associated with that person.

    Next, I changed the replace_name function to:

    def replace_name (matchobj):
        print (matchobj.group (1), matchobj.group (2))
        if matchobj.group (2) in names:
            return names [matchobj.group (2)]
    

    Results is what I wanted:

    >>> markdown(message)
    ![:Person](1234) 1234
    ![:Person](2345) 2345
    ![:Person](3456) 3456
    'John Doe Jane Smith \nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in Marry Jones voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'