pythonnlpnltklist-comprehensiontrigram

Finding the List of words in List of Sentences and return the matching sentences


From the List of Sentences and List of words, how to return the list of Sentences, only if all three words are matching from words Lists (Trigrams).

Please suggest. Below are example lists.

listwords = [['people','suffering','acute'], ['Covid-19','Corona','like'], ['people','must','collectively']]

listsent = ['The number of people suffering acute hunger could almost double.',
            'Lockdowns and global economic recession have',
            'one more shock – like Covid-19 – to push them over the edge',
            'people must collectively act now to mitigate the impact']

Output list should be first & last sentences, as they have three matching words in listwords.

Expected output is:

['The number of people suffering acute hunger could almost double.',
 'people must collectively act now to mitigate the impact']

Solution

  • Welcome to Stack Overflow

    Try this solution out:

    listwords = [['people','suffering','acute'], ['Covid-19','Corona','like'], ['people','must','collectively']]
    
    listsent = ['The number of people suffering acute hunger could almost double.',
                'Lockdowns and global economic recession have',
                'one more shock – like Covid-19 – to push them over the edge',
                'people must collectively act now to mitigate the impact']
    
    # interate through each sentence
    for sentence in listsent:
        # iterate through each group of words
        for words in listwords:
            # check to see if each word group is in the current sentence
            if all(word in sentence for word in words):
                print(sentence)
    

    I commented the lines to give you an idea of whats going on

    The first part of the code iterates through each sentence in your list

    for sentence in listsent:
    

    Then we need to iterate over the groups of words you have in your words list

    for words in listwords
    

    This is where things get fun. Since you have nested lists we need to check to make sure all three words are found in the sentence

    if all(word in sentence for word in words):
    

    Finally you can print out each sentence that contains all the words

    print(sentence)
    

    you could also put this in a function and return found sentences as a new list

    listwords = [['people','suffering','acute'], ['Covid-19','Corona','like'], ['people','must','collectively']]
    
    listsent = ['The number of people suffering acute hunger could almost double.',
                'Lockdowns and global economic recession have',
                'one more shock – like Covid-19 – to push them over the edge',
                'people must collectively act now to mitigate the impact']
    
    
    def check_words(listwords, listsent):
        listsent_new = []
        # interate through each sentence
        for sentence in listsent:
            # iterate through each group of words
            for words in listwords:
                # check to see if each word group is in the current sentence
                if all(word in sentence for word in words):
                    listsent_new.append(sentence)
        return listsent_new
    
    
    if __name__ == '__main__':
        print(check_words(listwords, listsent))