I have a large corpus which contains sentences such as
text = ["$3.4 million but not section 4.1"]
that I want to clean as
text = ["$3,4 million but not section 4.1"]
using a simple line such as
text.replace("$\d.\d","$\d,\d")
or with re.sub
.
but I don't know how to map the string "$"+digit+"." to "$"+digit+","
Any idea? thanks
def rep(m):
return m.group(1) + "," + m.group(2)
re.sub("([$][0-9]+).([0-9]+)",rep,text)