pythonpandas

Is there a way to filter columns of a pandas dataframe which include elements of a list?


In the below dataframe I would like to filter the columns based on a list called 'animals' to select all the columns that include the list elements.

animal_data = {
    "date": ["2023-01-22","2023-11-16","2024-06-30","2024-08-16","2025-01-22"],
    "cats_fostered": [1,2,3,4,5],
    "cats_adopted":[1,2,3,4,5],
    "dogs_fostered":[1,2,3,4,5],
    "dogs_adopted":[1,2,3,4,5],
    "rabbits_fostered":[1,2,3,4,5],
    "rabbits_adopted":[1,2,3,4,5]
}

animals = ["date","cat","rabbit"]

animal_data = {
    "date": ["2023-01-22","2023-11-16","2024-06-30","2024-08-16","2025-01-22"],
    "cats_fostered": [1,2,3,4,5],
    "cats_adopted":[1,2,3,4,5],
    "rabbits_fostered":[1,2,3,4,5],
    "rabbits_adopted":[1,2,3,4,5]
}

I have tried some approaches below but they either don't work with lists or return no columns as it is looking for an exact match with 'cats' or 'rabbits' and not just columns that contain the strings

animal_data[animal_data.columns.intersection(animals)]  # returns an empty df
animal_data.filter(regex=animals)  # returns an error: not able to use regex with a list

Solution

  • The issue with both attempts is that you are looking for a substring of the columns name. Except for the date column there is not a full match between the strings in the animals list and the actual column names.

    One possibility is to use filter with .join to construct the regex if using .filter, or a "manual" list comprehension with strings operations (for example in or .startswith).

    You can also "hardcode" "date" so the animals list only contains animals.

    >>> animals = ["cat"]
    >>> df.filter(regex="date|" + "|".join(animals))
             date  cats_fostered  cats_adopted
    0  2023-01-22              1             1
    1  2023-11-16              2             2
    2  2024-06-30              3             3
    3  2024-08-16              4             4
    4  2025-01-22              5             5
    >>> animals = ["cat", "rabbit"]
    >>> df.filter(regex="date|" + "|".join(animals))
             date  cats_fostered  cats_adopted  rabbits_fostered  rabbits_adopted
    0  2023-01-22              1             1                 1                1
    1  2023-11-16              2             2                 2                2
    2  2024-06-30              3             3                 3                3
    3  2024-08-16              4             4                 4                4
    4  2025-01-22              5             5                 5                5