So, I'm running Python 3.3.2, I have a string (sentence, paragraph(s)):
mystring=["walk walked walking talk talking talks talked fly flying"]
And i have another list with words i need to search in that string:
list_of_words=["walk","talk","fly"]
And my question is, is there a way to get as result:
Bottom line, is it possible to get a count on all possible variations of a word?
from difflib import get_close_matches
mystring="walk walked walking talk talking talks talked fly flying"
list_of_words=["walk","talk","fly"]
sp = mystring.split()
for x in list_of_words:
li = [y for y in get_close_matches(x,sp,cutoff=0.5) if x in y]
print '%-7s %d in %-10s' % (x,len(li),li)
result
walk 2 in ['walk', 'walked']
talk 3 in ['talk', 'talks', 'talked']
fly 2 in ['fly', 'flying']
The cutoff refers to the same ratio as computed by SequenceMatcher
:
from difflib import SequenceMatcher
sq = SequenceMatcher(None)
for x in list_of_words:
for w in sp:
sq.set_seqs(x,w)
print '%-7s %-10s %f' % (x,w,sq.ratio())
result
walk walk 1.000000
walk walked 0.800000
walk walking 0.727273
walk talk 0.750000
walk talking 0.545455
walk talks 0.666667
walk talked 0.600000
walk fly 0.285714
walk flying 0.200000
talk walk 0.750000
talk walked 0.600000
talk walking 0.545455
talk talk 1.000000
talk talking 0.727273
talk talks 0.888889
talk talked 0.800000
talk fly 0.285714
talk flying 0.200000
fly walk 0.285714
fly walked 0.222222
fly walking 0.200000
fly talk 0.285714
fly talking 0.200000
fly talks 0.250000
fly talked 0.222222
fly fly 1.000000
fly flying 0.666667