pythonfilterstatements

Filtering web articles by keywords inside of a loop


I wrote a function to scrape web articles but I want to adapt it in such a way that it checks if the article is relvant to me (based on a list of keywords) and ignores it if it isn't. I've found several ways to check if a string is inside another string, but somehow I can't get them to work inside a for-loop. Here's a light example of the function:

combos = ['apple and pear', 'pear and banana', 'apple and peach', 'banana and kiwi', 'peach and orange']
my_favorites = ['apple', 'peach']
caps = []

for i in combos:
    
    for j in my_favorites:
        if j not in i:
            continue
    
    caps.append(i.upper())
    
print(caps)

I want to skip to the next iteration of the loop if at least one of my favorite fruits are not included. But all the strings in the list are getting through the filter:

['APPLE AND PEAR', 'PEAR AND BANANA', 'APPLE AND PEACH', 'BANANA AND KIWI', 'PEACH AND ORANGE']

Can someone please explain my failure in understanding here?


Solution

  • I find regular expressions to be the best way to filter text especially when the input is a vast dataset. Below, I used python built-in re module to compile the pattern required and used regex match function to search through the list and match with the pattern.

    import re
    
    combos = ['apple and pear', 'pear and banana', 'apple and peach', 'banana and kiwi', 'peach and orange']
    
    my_favorites = ['apple', 'peach']
    
    regex_pattern = "|".join(my_favorites)
    
    r = re.compile(regex_pattern)
    
    filtered_list = filter(r.match, combos)
    
    caps = [item.upper() for item in filtered_list]