I have a large dataframe, but to make this easy it looks something like this one below
A B C
0 [a, b, c] 1 22
1 [d, e] 2 45
2 [f, g] 3 32
3 [h, i] 4 64
4 [j, k, l, m] 5 76
Now I have used:
df.explode('A')
To explode the dataframe like this:
A B C
0 a 1 22
0 b 1 22
0 c 1 22
1 d 2 45
1 e 2 45
2 f 3 32
2 g 3 32
3 h 4 64
3 i 4 64
4 j 5 76
4 k 5 76
4 l 5 76
4 m 5 76
And now I want to drop every last row from the exploded column 'A'. This means the code will remove the elements: c, e, g, i, m. The output should look something like this:
A B C
0 a 1 22
0 b 1 22
1 d 2 45
2 f 3 32
3 h 4 64
4 j 5 76
4 k 5 76
4 l 5 76
Any idea how I can do this? (note: it is a very large dataframe so I can't just select the rows manually)
Use boolean indexing
with Index.duplicated
:
df1 = df.explode('A')
df1 = df1[df1.index.duplicated(keep='last')]
print (df1)
A B C
0 a 1 22
0 b 1 22
1 d 2 45
2 f 3 32
3 h 4 64
4 j 5 76
4 k 5 76
4 l 5 76
Or remove last value of lists first by indexing:
df1 = df.assign(A = df.A.str[:-1]).explode('A')
print (df1)
A B C
0 a 1 22
0 b 1 22
1 d 2 45
2 f 3 32
3 h 4 64
4 j 5 76
4 k 5 76
4 l 5 76
Difference is if one element list(s):
print (df)
A B C
0 [a,b,c] 1 22
1 [d,e] 2 45
2 [f,g] 3 32
3 [h] 4 64
4 [j,k,l,m] 5 76
df1 = df.explode('A')
df1 = df1[df1.index.duplicated(keep='last')]
print (df1)
A B C
0 a 1 22
0 b 1 22
1 d 2 45
2 f 3 32
4 j 5 76
4 k 5 76
4 l 5 76
df1 = df.assign(A = df.A.str[:-1]).explode('A')
print (df1)
A B C
0 a 1 22
0 b 1 22
1 d 2 45
2 f 3 32
3 NaN 4 64
4 j 5 76
4 k 5 76
4 l 5 76