I have the following regex pattern word-word, so
r'\w+\-\w+
I would like to replace it with
r'\w+\s\-\s\w+
Example: I would like to change
hello-friends to hello - friends
I have tried the following with no success
df['mytextcolumn'].str.replace(r'(\\w+)(\\-)(\\w+)',r'(\\w+)(\\s)(\\-)(\\s)(\\w+)')
also tried with re.sub
re.sub(r'\\w+\\-\\w+',r'\\w+\\s\\-\\s\\w+','hello-friends')
but I still get back hello-friends, not hello - friends
I also checked my regex with an online regex matcher for python, and it picks up the patterns correctly, so I am confused why I am unable to replace it within my script.
You can not use a new pattern in the replacement. Instead you can use 2 capture groups in the initial pattern, and use \1 - \2
in the replacement.
You can capture -
also in a group, but as it is a single character that you are literally matching you can also just use that in the replacement.
(\w+)-(\w+)
See a regex demo
df['mytextcolumn'] = df['mytextcolumn'].str.replace(r'(\w+)-(\w+)',r'\1 - \2', regex=True)
print(df)
Output
mytextcolumn
0 hello - friends