pythonpython-3.xlistfiltersublist

How do you exclude sub-lists from a list based on whether or not a specific element of each sub-list meets some criteria?


(Python novice here:)

I have been trying to filter a list of sub-lists (all of the same length) based on the presence of certain strings within the elements of the sub-lists. To create criteria for inclusion, I have done the following, which has worked fine:

lines = [['Bob','Risk Manager','Company1'],
         ['Bill','Senior Quality Control Manager','Company1'],
         ['Jill','Accreditation Specialist','Company2'],
         ['Jane','Administrator','Company3'],
         ['Joe','IT Specialist','Company4']]

filtered_lines = []

inclusion_criteria = [['Risk',1],['Quality',1],['Accred',1]]

for line in lines:
    for criterion in inclusion_criteria:
        if criterion[0] in line[criterion[1]]:
            filtered_lines.append(line)

The above code filled the filtered_lines list with sub-lists whose second element contained 'Risk', 'Quality' or 'Accred', i.e. 'Jane' and 'Joe' were filtered out - this worked as planned.

However, if I instead want to define criteria for exclusion from the filtered_lines list, then the following does not work:

exclusion_criteria = [['Company1',2],['Company2',2]]

for line in lines:
    for criterion in exclusion_criteria:
        if criterion[0] not in line[criterion[1]]:
            filtered_lines.append(line)

When I run the above code, I want every sub-list whose third element does not contain 'Company1' or 'Company2' to be added to filtered_lines, i.e. filtered_lines should contain only 'Jane' and 'Joe', but this does not happen. Instead, no filtration occurs, and filtered_list comes out the same as the original lines list.

How would you go about excluding items from a list based on a set of exclusion criteria? Furthermore, is there a better way of approaching inclusion criteria?

P.S.: The lines list and criteria given here are just examples; the real lines is around 25,000 sub-lists long, and there are over a dozen inclusion and - if I can get it working - exclusion criteria. I'm not sure if/how the size of these objects effects any possible solutions.


Solution

  • You're looping over all elements in exclusion_criteria and if any one doesn't match then you add item to filtered_list. At the end that means your filtered_list has all items.

    Try to use all() and/or any() to get your match:

    lines = [
        ["Bob", "Risk Manager", "Company1"],
        ["Bill", "Senior Quality Control Manager", "Company1"],
        ["Jill", "Accreditation Specialist", "Company2"],
        ["Jane", "Administrator", "Company3"],
        ["Joe", "IT Specialist", "Company4"],
    ]
    
    exclusion_criteria = [["Company1", 2], ["Company2", 2]]
    filtered_lines = []
    
    for line in lines:
        if all(
            criterion[0] not in line[criterion[1]]
            for criterion in exclusion_criteria
        ):
            filtered_lines.append(line)
    
    print(filtered_lines)
    

    Prints:

    [
      ["Jane", "Administrator", "Company3"], 
      ["Joe", "IT Specialist", "Company4"]
    ]