In the below dataframe I would like to filter the columns based on a list called 'animals' to select all the columns that include the list elements.
animal_data = {
"date": ["2023-01-22","2023-11-16","2024-06-30","2024-08-16","2025-01-22"],
"cats_fostered": [1,2,3,4,5],
"cats_adopted":[1,2,3,4,5],
"dogs_fostered":[1,2,3,4,5],
"dogs_adopted":[1,2,3,4,5],
"rabbits_fostered":[1,2,3,4,5],
"rabbits_adopted":[1,2,3,4,5]
}
animals = ["date","cat","rabbit"]
animal_data = {
"date": ["2023-01-22","2023-11-16","2024-06-30","2024-08-16","2025-01-22"],
"cats_fostered": [1,2,3,4,5],
"cats_adopted":[1,2,3,4,5],
"rabbits_fostered":[1,2,3,4,5],
"rabbits_adopted":[1,2,3,4,5]
}
I have tried some approaches below but they either don't work with lists or return no columns as it is looking for an exact match with 'cats' or 'rabbits' and not just columns that contain the strings
animal_data[animal_data.columns.intersection(animals)] # returns an empty df
animal_data.filter(regex=animals) # returns an error: not able to use regex with a list
The issue with both attempts is that you are looking for a substring of the columns name. Except for the date
column there is not a full match between the strings in the animals
list and the actual column names.
One possibility is to use filter
with .join
to construct the regex if using .filter
, or a "manual" list comprehension with strings operations (for example in
or .startswith
).
You can also "hardcode" "date"
so the animals
list only contains animals.
>>> animals = ["cat"]
>>> df.filter(regex="date|" + "|".join(animals))
date cats_fostered cats_adopted
0 2023-01-22 1 1
1 2023-11-16 2 2
2 2024-06-30 3 3
3 2024-08-16 4 4
4 2025-01-22 5 5
>>> animals = ["cat", "rabbit"]
>>> df.filter(regex="date|" + "|".join(animals))
date cats_fostered cats_adopted rabbits_fostered rabbits_adopted
0 2023-01-22 1 1 1 1
1 2023-11-16 2 2 2 2
2 2024-06-30 3 3 3 3
3 2024-08-16 4 4 4 4
4 2025-01-22 5 5 5 5