pandas

split next pairs into columns


I would like to convert nested data pairs into multiple columns.

Say below example, the IN and OUT in action columns is data pairs but it's nested.

import pandas as pd

data = [
 [1,'IN'],
 [2,'IN'],
 [3,'OUT'],
 [4,'OUT'],
 [5,'IN'],
 [6,'IN'],
 [7,'OUT'],
 [8,'OUT']
]
df = pd.DataFrame(data, columns=['ts','action'])
print(df)

expected output is:

   ts_IN  ts_OUT
0   1      4
1   2      3
4   5      8
5   6      7

Any pandas method can handle this case?


Solution

  • You could use a custom python function, keeping track of the seen IN and yielding the last one every time you find an OUT:

    out = {}
    ins = []
    for i, (s, t) in enumerate(zip(df['action'], df['ts'])):
        if s == 'IN':
            ins.append(i)
            out[i] = [t, None]
        else:
            i = ins.pop()
            out[i][1] = t
    
    out = pd.DataFrame.from_dict(out, orient='index',
                                 columns=['ts_IN', 'ts_OUT'])
    

    NB. this requires that a any point you don't have more OUT than IN.

    Output:

       ts_IN  ts_OUT
    0      1       4
    1      2       3
    4      5       8
    5      6       7
    

    Other example:

    # input
    df = pd.DataFrame({'ts': range(10),
                       'action': ['IN', 'IN', 'OUT', 'IN', 'OUT', 'OUT', 
                                  'IN', 'IN', 'OUT', 'OUT']})
    
    # output
       ts_IN  ts_OUT
    0      0       5
    1      1       2
    3      3       4
    6      6       9
    7      7       8