I would like to convert nested data pairs into multiple columns.
Say below example, the IN
and OUT
in action
columns is data pairs but it's nested.
import pandas as pd
data = [
[1,'IN'],
[2,'IN'],
[3,'OUT'],
[4,'OUT'],
[5,'IN'],
[6,'IN'],
[7,'OUT'],
[8,'OUT']
]
df = pd.DataFrame(data, columns=['ts','action'])
print(df)
expected output is:
ts_IN ts_OUT
0 1 4
1 2 3
4 5 8
5 6 7
Any pandas method can handle this case?
You could use a custom python function, keeping track of the seen IN and yielding the last one every time you find an OUT:
out = {}
ins = []
for i, (s, t) in enumerate(zip(df['action'], df['ts'])):
if s == 'IN':
ins.append(i)
out[i] = [t, None]
else:
i = ins.pop()
out[i][1] = t
out = pd.DataFrame.from_dict(out, orient='index',
columns=['ts_IN', 'ts_OUT'])
NB. this requires that a any point you don't have more OUT than IN.
Output:
ts_IN ts_OUT
0 1 4
1 2 3
4 5 8
5 6 7
Other example:
# input
df = pd.DataFrame({'ts': range(10),
'action': ['IN', 'IN', 'OUT', 'IN', 'OUT', 'OUT',
'IN', 'IN', 'OUT', 'OUT']})
# output
ts_IN ts_OUT
0 0 5
1 1 2
3 3 4
6 6 9
7 7 8