pythonfilterpandas

How do you filter pandas dataframes by multiple columns?


To filter a DataFrame (df) by a single column, if we consider data with male and females we might:

males = df[df[Gender]=='Male']

Question 1: But what if the data spanned multiple years and I wanted to only see males for 2014?

In other languages I might do something like:

if A = "Male" and if B = "2014" then 

(except I want to do this and get a subset of the original DataFrame in a new dataframe object)

Question 2: How do I do this in a loop, and create a dataframe object for each unique sets of year and gender (i.e. a df for: 2013-Male, 2013-Female, 2014-Male, and 2014-Female?

for y in year:

for g in gender:

df = .....

Solution

  • Using & operator, don't forget to wrap the sub-statements with ():

    males = df[(df[Gender]=='Male') & (df[Year]==2014)]
    

    To store your DataFrames in a dict using a for loop:

    from collections import defaultdict
    dic={}
    for g in ['male', 'female']:
        dic[g]=defaultdict(dict)
        for y in [2013, 2014]:
            dic[g][y]=df[(df[Gender]==g) & (df[Year]==y)] #store the DataFrames to a dict of dict
    

    A demo for your getDF:

    def getDF(dic, gender, year):
        return dic[gender][year]
    
    print genDF(dic, 'male', 2014)