I am working on this dataset called titanic.csv
Let's simplify the problem and include some data here:
I need to calculate all missing values for child
, as you see it is a value under who
column. This should be done using a pivot table.
I have tried this solution:
pd.pivot_table(df[df['who'] == 'child'],
index='sex',
aggfunc=lambda x: x.isnull().sum(),
margins=True) # to sum all missing values based on gender
But I get this output: in which as you also notice, ALL row doesn't sum all missing values per gender.
Where is the problem in my code? Should I use another way to create the pivot table?
EDIT:
If you prefer to use a pivot table, just add the parameter dropna=False to get the result you want.
First answer:
If you want the number of missing values per features for only the child you can use isna/isnull directly after filtering:
data = {'survived': [0, 1, 1, 1, 0],
'pclass': [3, 1, None, 1, 3],
'sex': ['male', 'female', 'female', 'female', 'male'],
'age': [22, 38, None, None, 35],
'class_': ['Third', 'First', None, 'First', 'Third'],
'who': ['man', 'woman', 'child', 'child', 'man'],
'deck': [None, 'C', None, 'C', None],
'alive': ['no', 'yes', 'yes', 'yes', 'no'],
'alone': [False, False, True, False, True] }
df = pd.DataFrame(data)
display(df[df["who"] == "child"].isna().sum())
survived 0
pclass 1
sex 0
age 2
class_ 1
who 0
deck 1
alive 0
alone 0