New to Python. I come from a SQL world where I'm used to running queries and applying them. It's handy to take a list of things, get their count and then use a subset of that count (like the top 5) and apply it to other data. With Python/Pandas, I still have not quite grokked the process. By way of example:
A simple dataset:
`import pandas as pd
dataset = (
[1,2,3,4,5,6],
[1,None,3,4,5,6],
[1,None,3,4,5,6],
[1,2,None,4,5,6],
[1,None,3,None,5,6],
[1,2,None,4,5,6],
[1,None,3,None,5,6],
[1,2,3,4,5,None],
[1,2,3,4,5,None]
)
df = pd.DataFrame(dataset, columns=['A','B','C','D','E','F'])`
Then make a dataframe to find the NaNs:
nan_df = df.isna()
Then count the instances of each row:
grouped_nan = nan_df.groupby(['A','B', 'C', 'D','E', 'F'], sort=True).value_counts()
The original set I was working was had ~200 rows. This simplified example yields this:
A B C D E F
False False False False False False 1
True 2
True False False False 2
True False False False False 2
True False False 2
Name: count, dtype: int64
This is where I run into trouble. The things I want to do are best done in a Dataframe (the above is a Series). The following makes it a DataFrame:
grouped_nan_df = grouped_nan.to_frame()
But it doesn't bring along the last column (the count) in a way I can manage. I can see it, but I can't do anything with it.
If I try to reference the column with the counts, it does not recognize it.
If I try to rename that last column it doesn't work:
`grouped_nan_df.rename(columns={grouped_nan_df.columns[5]:"new_count"}, inplace=True)`
gives the error "index 5 is out of bounds for axis 0 with size 1".
What I want at the end is a DataFrame that includes the counts. Is there a way to get there?
Any help appreciated!
Andy
IIUC, you can just use groupby
with as_index=False
and then take the group size
:
out = nan_df.groupby(['A','B', 'C', 'D','E', 'F'], as_index=False).size()
Output:
A B C D E F size
0 False False False False False False 1
1 False False False False False True 2
2 False False True False False False 2
3 False True False False False False 2
4 False True False True False False 2