pandasdataframedroppandas-explode

How to drop the last row of an exploded dataframe


I have a large dataframe, but to make this easy it looks something like this one below

             A  B  C
0    [a, b, c]  1 22
1       [d, e]  2 45
2       [f, g]  3 32
3       [h, i]  4 64
4 [j, k, l, m]  5 76

Now I have used:

df.explode('A')

To explode the dataframe like this:

    A  B  C
0   a  1 22
0   b  1 22
0   c  1 22
1   d  2 45
1   e  2 45
2   f  3 32
2   g  3 32
3   h  4 64
3   i  4 64
4   j  5 76
4   k  5 76
4   l  5 76
4   m  5 76

And now I want to drop every last row from the exploded column 'A'. This means the code will remove the elements: c, e, g, i, m. The output should look something like this:

    A  B  C
0   a  1 22
0   b  1 22
1   d  2 45
2   f  3 32
3   h  4 64
4   j  5 76
4   k  5 76
4   l  5 76

Any idea how I can do this? (note: it is a very large dataframe so I can't just select the rows manually)


Solution

  • Use boolean indexing with Index.duplicated:

    df1 = df.explode('A')
    df1 = df1[df1.index.duplicated(keep='last')]
    print (df1)
        A  B  C
    0   a  1 22
    0   b  1 22
    1   d  2 45
    2   f  3 32
    3   h  4 64
    4   j  5 76
    4   k  5 76
    4   l  5 76
    

    Or remove last value of lists first by indexing:

    df1 = df.assign(A = df.A.str[:-1]).explode('A')
    print (df1)
       A  B   C
    0  a  1  22
    0  b  1  22
    1  d  2  45
    2  f  3  32
    3  h  4  64
    4  j  5  76
    4  k  5  76
    4  l  5  76
    

    Difference is if one element list(s):

    print (df)
               A  B   C
    0    [a,b,c]  1  22
    1      [d,e]  2  45
    2      [f,g]  3  32
    3        [h]  4  64
    4  [j,k,l,m]  5  76
    
    df1 = df.explode('A')
    df1 = df1[df1.index.duplicated(keep='last')]
    print (df1)
       A  B   C
    0  a  1  22
    0  b  1  22
    1  d  2  45
    2  f  3  32
    4  j  5  76
    4  k  5  76
    4  l  5  76
    
    df1 = df.assign(A = df.A.str[:-1]).explode('A')
    print (df1)
         A  B   C
    0    a  1  22
    0    b  1  22
    1    d  2  45
    2    f  3  32
    3  NaN  4  64
    4    j  5  76
    4    k  5  76
    4    l  5  76