pythonpandasdataframeisnull

Split Pandas DataFrame on Blank rows


I have a large dataframe that I need to split on empty rows.

here's a simplified example of the DataFrame:

    A   B   C
0   1   0   International
1   1   1   International
2   NaN 2   International
3   1   3   International
4   1   4   International
5   8   0   North American
6   8   1   North American
7   8   2   North American
8   8   3   North American
9   NaN NaN NaN
10  1   0   Internal
11  1   1   Internal
12  6   0   East
13  6   1   East
14  6   2   East
...

As you can see, row 9 is blank. What I need to do is take rows 0 through 8 and put them in a different dataframe, as well as rows 10 to the next blank so that I have several dataframes in the end. Notice, when looking for blank rows I need the whole row to be blank.

Here is the code I'm using to find blanks:

def find_breaks(df):
    df_breaks = df[(df.loc[:,['A','B','C']].isnull()).any(axis=1)]
    print(df_breaks.index)

This code works when I test it on the simplified DF but, of course, my real DataFrame has many more columns than ['A','B','C']

How can I find the next blank row (or as I am doing above, all the blank rows at once) without having to specify my column names?

Thanks


Solution

  • IIUC, use pd.isnull + np.split:

    df_list = np.split(df, df[df.isnull().all(1)].index) 
    
    for df in df_list:
        print(df, '\n') 
    
         A    B               C
    0  1.0  0.0   International
    1  1.0  1.0   International
    2  NaN  2.0   International
    3  1.0  3.0   International
    4  1.0  4.0   International
    5  8.0  0.0  North American
    6  8.0  1.0  North American
    7  8.0  2.0  North American
    8  8.0  3.0  North American 
    
          A    B         C
    9   NaN  NaN       NaN
    10  1.0  0.0  Internal
    11  1.0  1.0  Internal
    12  6.0  0.0      East
    13  6.0  1.0      East
    14  6.0  2.0      East 
    

    First, obtain the indices where the entire row is null, and then use that to split your dataframe into chunks. np.split handles dataframes quite well.