I have a large dataframe that I need to split on empty rows.
here's a simplified example of the DataFrame:
A B C
0 1 0 International
1 1 1 International
2 NaN 2 International
3 1 3 International
4 1 4 International
5 8 0 North American
6 8 1 North American
7 8 2 North American
8 8 3 North American
9 NaN NaN NaN
10 1 0 Internal
11 1 1 Internal
12 6 0 East
13 6 1 East
14 6 2 East
...
As you can see, row 9 is blank. What I need to do is take rows 0 through 8 and put them in a different dataframe, as well as rows 10 to the next blank so that I have several dataframes in the end. Notice, when looking for blank rows I need the whole row to be blank.
Here is the code I'm using to find blanks:
def find_breaks(df):
df_breaks = df[(df.loc[:,['A','B','C']].isnull()).any(axis=1)]
print(df_breaks.index)
This code works when I test it on the simplified DF but, of course, my real DataFrame has many more columns than ['A','B','C']
How can I find the next blank row (or as I am doing above, all the blank rows at once) without having to specify my column names?
Thanks
IIUC, use pd.isnull
+ np.split
:
df_list = np.split(df, df[df.isnull().all(1)].index)
for df in df_list:
print(df, '\n')
A B C
0 1.0 0.0 International
1 1.0 1.0 International
2 NaN 2.0 International
3 1.0 3.0 International
4 1.0 4.0 International
5 8.0 0.0 North American
6 8.0 1.0 North American
7 8.0 2.0 North American
8 8.0 3.0 North American
A B C
9 NaN NaN NaN
10 1.0 0.0 Internal
11 1.0 1.0 Internal
12 6.0 0.0 East
13 6.0 1.0 East
14 6.0 2.0 East
First, obtain the indices where the entire row is null, and then use that to split your dataframe into chunks. np.split
handles dataframes quite well.