pythonregexstring

Removing specific pattern from a string using regex in python


I am trying to remove the pattern using following code

x = "mr<u+092d><u+093e><u+0935><u+0941><u+0915>" 
pattern = '[<u+0-9de>]'
re.sub(pattern,'', x)

Output

mr

This output is actually correct for the given sample string but when I am running this code to the corpus, it removing all the occurrences of 'de' as well as digits etc. I want these things are replaced only when < > is used.


Solution

  • You need to put the <> outside, as the structure will always be

    pattern = '<u\+[0-9a-f]{4}>'
    re.sub(pattern,'', x)
    

                                      REGEX DEMOCODE DEMO