pythonregexstring-function

Find first word in string Python


I have to write a single function that should return the first word in the following strings:

("Hello world") -> return "Hello"
(" a word ") -> return "a"
("don't touch it") -> return "don't"
("greetings, friends") -> return "greetings"
("... and so on ...") -> return "and"
("hi") -> return "hi"

All have to return the first word and as you can see some start with a whitespace, have apostrophes or end with commas.

I've used the following options:

return text.split()[0]
return re.split(r'\w*, text)[0]

Both error at some of the strings, so who can help me???


Solution

  • It is tricky to distinguish apostrophes which are supposed to be part of a word and single quotes which are punctuation for the syntax. But since your input examples do not show single quotes, I can go with this:

    re.match(r'\W*(\w[^,. !?"]*)', text).groups()[0]
    

    For all your examples, this works. It won't work for atypical stuff like "'tis all in vain!", though. It assumes that words end on commas, dots, spaces, bangs, question marks, and double quotes. This list can be extended on demand (in the brackets).