pythonlistloopstwitterlexicon

Python: Giving Score for Each Word stored in a Variable after Looping


I have a case that I need to solve but I've been stuck with no solution for almost a week. here's the case. I have three variables:

candidates = ["you", "the", "best", "love", "fun", "feeling", "emotionally"]
seeds = ["happy", "love", "enjoy", "fun", "grace", "sad", "guilty"]
tweets = ["you look so happy", "I am in love with you", "hey you do the best at having fun okay", "i am emotionally sad right now", "feeling guilty"]

and what I want to do is I want to pair the word from variable "candidates" with words from "seeds", loop them together over the tweets and give score after loop to each word from variable "candidates".

so for example: for the first loop I pair:

you + happy
you + love
you + enjoy
you + fun
you + grace
you + sad
you + guilty

and loop them over the strings in variable "tweets" and give the score based on how many times these pair appear in sentences from variable tweets. so in this case the score I'll get for word "you" paired with all words from variable "seeds" are 3.

and continue with the second loop with pairs which are:

the + happy
the + love
the + fun
the + enjoy
the + grace
the + sad
the + guilty

and loop them over again to the strings in variable "tweets" and give the score based on how many times these pair appear in sentences from variable tweets. so in this case the score I'll get for word "you" paired with all words from variable "seeds" are 1.

I want my program to be able to automatically return score for each word from variable "candidates" paired with words from variable "seeds" and loop them together to the strings in variable "tweets".

#looping all pair candidates and seed to the tweets
for tweet in tweets:
    for candidate in candidates:
        for seed in seeds:
            if candidate in tweet and seed in tweet:
                if "happy" in tweet or "love" in tweet or "fun" in tweet:
                    print(candidate, seed, tweet)
                    count_happy += 1
                elif "sad" in tweet or "guilty" in tweet:
                    print(candidate, seed, tweet)
                    count_sad += 1
        count_a += 1

above is the script I created to do what I want to do but it doesn't work the way I want it to. so please does anyone know how to solve this problem of mine? it's been a week already and I haven't found the solution yet.


Solution

  • Here is a sample script for what you want to do : what is did is added each candidate element to dictionary with key as candidate name+ a string all seeds . When looping through seeds i just append this key's value with +=1

    import codecs
    import itertools
    import threading
    import csv
    
    
    candidates = []
    seeds = []
    tweets = []
    global lock
    
    with codecs.open("d:\\untitled\\candidates.csv", encoding='utf8')  as candFile:
        readCSV = csv.reader(candFile, delimiter=',')
        for lines in candFile:
            if lines.rstrip() != "" :
                candidates.append(lines.rstrip())
    
    with codecs.open("d:\\untitled\\seeds.csv",encoding='utf8')  as candFile:
        readCSV = csv.reader(candFile, delimiter=',')
        for lines in candFile:
            if lines.rstrip() != "" :
                seeds.append(lines.rstrip())
    
    with codecs.open("d:\\untitled\\tweets.csv", encoding='utf8')  as candFile:
        readCSV = csv.reader(candFile, delimiter=',')
        for lines in candFile:
            if lines.rstrip() != "" :
                tweets.append(lines.rstrip())
    
    
    counts = {}
    
    
    def findMatch(listToWorkWith,tweet):
        for CS in listToWorkWith:
            if (CS[0] in tweet) and (CS[1] in tweet):
                try:
                    counts[CS[0] + "+ All Seeds"] += 1
                except:
                    counts[CS[0] + "+ All Seeds"] = 1
    
    
    listOFAugrs = list(itertools.product(candidates, seeds))
    lock = threading.Lock()
    threads = [threading.Thread(target=findMatch, args=(listOFAugrs,x)) for x in tweets[0:10]]
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    
    print(counts)
    

    This is the best we can do Here, Still the number are huge for each tweet we have an exponential amount of combinations to loop . So It is going to take some time.

    Output for 10 Tweets and all the combinations i.e tweets[0:10]

    {'jypetwic+ All Seeds': 2, 'twiceth+ All Seeds': 2, '5th+ All Seeds': 2, 'mini+ All Seeds': 2, 'albumltwhat+ All Seeds': 2, 'lovegtreleas+ All Seeds': 2, 'onlinemelon+ All Seeds': 2, 'twice+ All Seeds': 2, '트와이스+ All Seeds': 2, 'whatislov+ All Seeds': 2, 'u+ All Seeds': 16, 'voidtopaz+ All Seeds': 4, 'fandom+ All Seeds': 4, 'lolchoni+ All Seeds': 4, 'm+ All Seeds': 26, 'sinhuigang+ All Seeds': 2, 'rocket+ All Seeds': 2, 'confid+ All Seeds': 2, 'gayfal+ All Seeds': 2, 'distinguish+ All Seeds': 2, 'gaymi+ All Seeds': 2, 'gay+ All Seeds': 2, 'and+ All Seeds': 11, 're+ All Seeds': 7, 'morwennajh+ All Seeds': 3, 'imagin+ All Seeds': 3, 'painthi+ All Seeds': 3, 'darl+ All Seeds': 3, 'boy+ All Seeds': 6, 'inbut+ All Seeds': 3, 'need+ All Seeds': 3, 'chanc+ All Seeds': 3, 'liveco+ All Seeds': 3, 'singet+ All Seeds': 3, 'get+ All Seeds': 7, 'de+ All Seeds': 9, 'ed+ All Seeds': 3, 'akoposimarcelo+ All Seeds': 1, 'dontdont+ All Seeds': 1, 'deservem+ All Seeds': 1, 'im+ All Seeds': 7, 'live+ All Seeds': 3, 'ke+ All Seeds': 7, 'but+ All Seeds': 3, 'ur+ All Seeds': 10, 'htapmenami+ All Seeds': 5, 'alway+ All Seeds': 5, 'saturday+ All Seeds': 4, 'invit+ All Seeds': 4, 'friend+ All Seeds': 4, 'teach+ All Seeds': 4, 'magicyou+ All Seeds': 4, 'llanowar+ All Seeds': 4, 'elv+ All Seeds': 4, 'promo+ All Seeds': 4, 'll+ All Seeds': 4, 'or+ All Seeds': 5, 'y+ All Seeds': 21, 'p+ All Seeds': 24, 'imag+ All Seeds': 3, 'badrepseokjin+ All Seeds': 2, 'loop+ All Seeds': 2, 'jin+ All Seeds': 2, 'say+ All Seeds': 2, 'korean+ All Seeds': 2, 'english+ All Seeds': 2, 'vid+ All Seeds': 2, 'cr+ All Seeds': 2, 'eundaromi+ All Seeds': 2, 'r+ All Seeds': 26, 'abl+ All Seeds': 5, 'harder+ All Seeds': 5, 'end+ All Seeds': 4, 'sat+ All Seeds': 4, 'h+ All Seeds': 22, 'id+ All Seeds': 8, 'ab+ All Seeds': 5, 'yo+ All Seeds': 7, 'dr+ All Seeds': 2, 'gu+ All Seeds': 5, 'oc+ All Seeds': 2, 'con+ All Seeds': 5, 'f+ All Seeds': 13, 'v+ All Seeds': 25, 'ad+ All Seeds': 3, 'sh+ All Seeds': 4, 'ou+ All Seeds': 7, 'ive+ All Seeds': 3, 'don+ All Seeds': 1, 'n+ All Seeds': 27, 'rock+ All Seeds': 2, 'bu+ All Seeds': 5, 'neadawn808+ All Seeds': 1, 'soo+ All Seeds': 1, 'paint+ All Seeds': 3, 'sin+ All Seeds': 5, 'releas+ All Seeds': 2, 'twic+ All Seeds': 2, 'deserv+ All Seeds': 1, 'sa+ All Seeds': 6, 'l+ All Seeds': 27, 'way+ All Seeds': 5, 'album+ All Seeds': 2, 'melon+ All Seeds': 2, 'k+ All Seeds': 13, 'j+ All Seeds': 11, 've+ All Seeds': 21, 'gucciboytaeba+ All Seeds': 3, 'pass+ All Seeds': 3, 'second+ All Seeds': 3, 'bc+ All Seeds': 3, 'your+ All Seeds': 3, 'th+ All Seeds': 5, 'd+ All Seeds': 25, 'ig+ All Seeds': 2, 'life+ All Seeds': 3, 'go+ All Seeds': 3, 'stress+ All Seeds': 3, 'time+ All Seeds': 3, 'rn+ All Seeds': 3, 'nb+ All Seeds': 3, 'pa+ All Seeds': 10, 'eu+ All Seeds': 2, 'jh+ All Seeds': 3, 'vi+ All Seeds': 6, 'en+ All Seeds': 14, 'ok+ All Seeds': 7, 'b+ All Seeds': 15, 'c+ All Seeds': 21, 'magic+ All Seeds': 4, 'dese+ All Seeds': 1, 'hu+ All Seeds': 2, 'g+ All Seeds': 16, 'op+ All Seeds': 7, 'w+ All Seeds': 15, 'co+ All Seeds': 12, 'ar+ All Seeds': 15, 'fr+ All Seeds': 4, 'ma+ All Seeds': 8, 'e+ All Seeds': 27, 'o+ All Seeds': 27, 'fa+ All Seeds': 6, 'int+ All Seeds': 3, 'ea+ All Seeds': 9, 'bo+ All Seeds': 6, 'ang+ All Seeds': 2, 'di+ All Seeds': 2, 'bro+ All Seeds': 5, 'kor+ All Seeds': 2, 'la+ All Seeds': 4, 'bum+ All Seeds': 2, 'ti+ All Seeds': 7, 'lov+ All Seeds': 18, 'se+ All Seeds': 6, 'dis+ All Seeds': 2, 'ng+ All Seeds': 7, 'lo+ All Seeds': 18, 'und+ All Seeds': 2, 'aint+ All Seeds': 3, 'kore+ All Seeds': 2, 'rom+ All Seeds': 6, 'li+ All Seeds': 10, 'aro+ All Seeds': 2, 'korea+ All Seeds': 2, 'el+ All Seeds': 7, 'ko+ All Seeds': 3, 'al+ All Seeds': 13, 'pro+ All Seeds': 4, 'na+ All Seeds': 8, 'un+ All Seeds': 2, 'lan+ All Seeds': 4, '8+ All Seeds': 1, 'rea+ All Seeds': 2, 'serv+ All Seeds': 1, 'mor+ All Seeds': 3, 'ain+ All Seeds': 3, 'prom+ All Seeds': 4, 'men+ All Seeds': 5, 'tu+ All Seeds': 4, 'ov+ All Seeds': 18, 'mo+ All Seeds': 7, 'pm+ All Seeds': 5, 'mag+ All Seeds': 7, 'sing+ All Seeds': 3, 'gang+ All Seeds': 2, 'ht+ All Seeds': 5, 'ch+ All Seeds': 13, 'te+ All Seeds': 4, 'ting+ All Seeds': 2, 'jyp+ All Seeds': 2, 'thi+ All Seeds': 3, 'ep+ All Seeds': 2, 'anc+ All Seeds': 3, 'st+ All Seeds': 5, 'rd+ All Seeds': 12, 'wha+ All Seeds': 2, 'han+ All Seeds': 3, 'br+ All Seeds': 5, 'wa+ All Seeds': 9, 'in+ All Seeds': 13, 'seokjin+ All Seeds': 2, 'gi+ All Seeds': 7, 'fi+ All Seeds': 2, 'ay+ All Seeds': 13, 'mi+ All Seeds': 11, 'tin+ All Seeds': 2, 'mu+ All Seeds': 3, 'da+ All Seeds': 10, 'cha+ All Seeds': 5, 'ri+ All Seeds': 4, 'oy+ All Seeds': 10, 'mar+ All Seeds': 1, 'nda+ All Seeds': 2, 'ngl+ All Seeds': 2, 'rep+ All Seeds': 2, 'ho+ All Seeds': 4, 'ima+ All Seeds': 4, 'seo+ All Seeds': 2, 'ne+ All Seeds': 6, 'res+ All Seeds': 3, 'lea+ All Seeds': 2, 'turd+ All Seeds': 4, 'et+ All Seeds': 11, 'fe+ All Seeds': 3, 'tim+ All Seeds': 3, '5+ All Seeds': 2, 'fri+ All Seeds': 4, 'line+ All Seeds': 2, 'gt+ All Seeds': 2, 'ro+ All Seeds': 13, 'liv+ All Seeds': 3, 'si+ All Seeds': 6, 'ba+ All Seeds': 5, 'wh+ All Seeds': 2, 'jype+ All Seeds': 2, 'dom+ All Seeds': 4, 'chao+ All Seeds': 2, 'nd+ All Seeds': 13, 'ji+ All Seeds': 2, 'whati+ All Seeds': 2, 'ac+ All Seeds': 4, 'eth+ All Seeds': 2, 'aj+ All Seeds': 3, 'deser+ All Seeds': 1, 'ai+ All Seeds': 3, 'wi+ All Seeds': 2, 'em+ All Seeds': 3, 'cc+ All Seeds': 3, 'ag+ All Seeds': 7, 'what+ All Seeds': 2, 'om+ All Seeds': 10, 'onlin+ All Seeds': 2, 'eco+ All Seeds': 6, 'ove+ All Seeds': 18, 'lon+ All Seeds': 2, 'ot+ All Seeds': 2, 'ol+ All Seeds': 8, 'le+ All Seeds': 2, 'az+ All Seeds': 4, 'pet+ All Seeds': 2, 'ie+ All Seeds': 4, 'ae+ All Seeds': 3, 'dre+ All Seeds': 2, 'chan+ All Seeds': 3, 'po+ All Seeds': 1, 'min+ All Seeds': 2, 'gin+ All Seeds': 3, 'eng+ All Seeds': 2, 'ine+ All Seeds': 2, 'mur+ All Seeds': 3, 'ak+ All Seeds': 1, 'aw+ All Seeds': 1, 'tap+ All Seeds': 5, 'ci+ All Seeds': 3, 'nee+ All Seeds': 3, 'onl+ All Seeds': 2, 'lt+ All Seeds': 2, 'ce+ All Seeds': 3, 'um+ All Seeds': 2, 'ako+ All Seeds': 1, 'ap+ All Seeds': 5, '0+ All Seeds': 1, 'arc+ All Seeds': 1, 'tr+ All Seeds': 5, 'bl+ All Seeds': 5, 'har+ All Seeds': 5, 'rel+ All Seeds': 2, 'ice+ All Seeds': 2, 'tae+ All Seeds': 3, 'gucci+ All Seeds': 3, 'ni+ All Seeds': 6, 'ta+ All Seeds': 8, 'twi+ All Seeds': 2, 'fu+ All Seeds': 3, 'pe+ All Seeds': 2, 'hati+ All Seeds': 2, 'tw+ All Seeds': 2, 'pos+ All Seeds': 1, 'ami+ All Seeds': 5, 'vo+ All Seeds': 4, 'lif+ All Seeds': 3, 'sim+ All Seeds': 1, 'cond+ All Seeds': 3, 'yt+ All Seeds': 3, 'void+ All Seeds': 4, 'relea+ All Seeds': 2, 'whatislo+ All Seeds': 2, 'dawn+ All Seeds': 1, 'sec+ All Seeds': 3, 'ml+ All Seeds': 2, 'eas+ All Seeds': 2, '80+ All Seeds': 1, 'nam+ All Seeds': 5, 'econ+ All Seeds': 3, 'ada+ All Seeds': 1, 'ont+ All Seeds': 1, 'posi+ All Seeds': 1, 'ib+ All Seeds': 3, 'lb+ All Seeds': 2, 'seco+ All Seeds': 3, 'veg+ All Seeds': 2, 'isl+ All Seeds': 2, 'hat+ All Seeds': 2}