[SOLVED] Python regex replacement with str.replace. Pattern (wordhypenword) Replacement (wordspacehypenspaceword)

Python regex replacement with str.replace. Pattern (wordhypenword) Replacement (wordspacehypenspaceword)

I have the following regex pattern word-word, so

r'\w+\-\w+

I would like to replace it with

r'\w+\s\-\s\w+

Example: I would like to change

hello-friends to hello - friends

I have tried the following with no success

df['mytextcolumn'].str.replace(r'(\\w+)(\\-)(\\w+)',r'(\\w+)(\\s)(\\-)(\\s)(\\w+)')

also tried with re.sub

re.sub(r'\\w+\\-\\w+',r'\\w+\\s\\-\\s\\w+','hello-friends') 

but I still get back hello-friends, not hello - friends

I also checked my regex with an online regex matcher for python, and it picks up the patterns correctly, so I am confused why I am unable to replace it within my script.

Solution

You can not use a new pattern in the replacement. Instead you can use 2 capture groups in the initial pattern, and use \1 - \2 in the replacement.

You can capture - also in a group, but as it is a single character that you are literally matching you can also just use that in the replacement.

(\w+)-(\w+)

See a regex demo

df['mytextcolumn'] = df['mytextcolumn'].str.replace(r'(\w+)-(\w+)',r'\1 - \2', regex=True)
print(df)

Output

      mytextcolumn
0  hello - friends