My requirement is to remove leading "#" symbol from hashtags in a text. For example, sentence: I'm feeling #blessed.
should transform to I'm feeling blessed.
I have written this function, but I'm sure I can achieve the same with a simpler logic in RegEx.
clean_sentence = ""
space = " "
for token in sentence.split():
if token[0] is '#':
token = token[1:]
clean_sentence += token + space
return clean_sentence
Need help here!!
The regex provided by by @Tim #(\S+)
would also match hashtags in non-starting position if they have another non-whitespace character \S
behind them, e.g. as in so#blessed
.
We can prevent this by adding a negative lookbehind (?<!\S)
before the hash, so that it can't be preceded by anything that is not a whitespace.
inp = "#I'm #feeling #blessed so#blessed .#here#."
output = re.sub(r'(?<!\S)#(\S+)', r'\1', inp)
print(output)
output:
I'm feeling blessed so#blessed .#here#.