Working on a sentiment analysis problem, I am trying to exclude the usernames from the text of tweets. For example, having the following tweet.
`Hey @SOCommunity check this out!`
I'm trying to keep just this
`Hey check this out!`
So far I've seen how to select the username @\S+\s+
and I've tried to negate it using this expression ^(?!@\S+\s+)\w+
which only captures the Hey
leaving out the rest of it.
How should I edit the expression to also catch the rest of the tweet?
You can use sed
to replace the user name from the text. Sed command sed 's/@[a-zA-Z0-9]* //'
Ex:
echo 'Hey @SOCommunity1 check this out!' | sed 's/@[a-zA-Z0-9_]\{1,15\} //'
Output:
Hey check this out!
To apply sed
command against a file named tweets.tx
sed 's/@[a-zA-Z0-9_]\{1,15\} //' tweets.txt