pythonlist

find variations of a word in a string on python


So, I'm running Python 3.3.2, I have a string (sentence, paragraph(s)):

mystring=["walk walked walking talk talking talks talked fly flying"]

And i have another list with words i need to search in that string:

list_of_words=["walk","talk","fly"]

And my question is, is there a way to get as result:

  1. The word walk or a variation is present 3 times
  2. The word talk or a variation is present 4 times
  3. The word fly or a variation is present 2 times

Bottom line, is it possible to get a count on all possible variations of a word?


Solution

  • from difflib import get_close_matches
    mystring="walk walked walking talk talking talks talked fly flying"
    list_of_words=["walk","talk","fly"]
    
    sp = mystring.split()
    for x in list_of_words:
        li = [y for y in get_close_matches(x,sp,cutoff=0.5) if x in y]
        print '%-7s %d in %-10s' % (x,len(li),li)
    

    result

    walk    2  in ['walk', 'walked']
    talk    3  in ['talk', 'talks', 'talked']
    fly     2  in ['fly', 'flying']
    

    The cutoff refers to the same ratio as computed by SequenceMatcher :

    from difflib import SequenceMatcher
    
    sq = SequenceMatcher(None)
    for x in list_of_words:
        for w in sp:
            sq.set_seqs(x,w)
            print '%-7s %-10s %f' % (x,w,sq.ratio())
    

    result

    walk    walk       1.000000
    walk    walked     0.800000
    walk    walking    0.727273
    walk    talk       0.750000
    walk    talking    0.545455
    walk    talks      0.666667
    walk    talked     0.600000
    walk    fly        0.285714
    walk    flying     0.200000
    talk    walk       0.750000
    talk    walked     0.600000
    talk    walking    0.545455
    talk    talk       1.000000
    talk    talking    0.727273
    talk    talks      0.888889
    talk    talked     0.800000
    talk    fly        0.285714
    talk    flying     0.200000
    fly     walk       0.285714
    fly     walked     0.222222
    fly     walking    0.200000
    fly     talk       0.285714
    fly     talking    0.200000
    fly     talks      0.250000
    fly     talked     0.222222
    fly     fly        1.000000
    fly     flying     0.666667