pythonpandaspcapnetwork-flow

Pandas indexing by column pairs (5-tuple)


I'm tring to set flow id for network 5-tuple, the original dataframe looks like:

tup = [['192.168.0.1', '1032', '192.168.0.2', '443'],
   ['192.168.0.1', '1032', '192.168.0.2', '443'],
   ['192.168.0.1', '1034', '192.168.0.2', '443'],
   ['192.168.0.2', '443', '192.168.0.1', '1034'],
   ['192.168.0.1', '1034', '192.168.0.2', '443'],
   ['192.168.0.1', '1034', '192.168.0.2', '443'],
   ['192.168.0.2', '443', '192.168.0.1', '1034'],
   ['192.168.0.2', '443', '192.168.0.1', '1034'],
   ['192.168.0.1', '1032', '192.168.0.2', '443'],
   ['192.168.0.2', '443', '192.168.0.1', '1032']]

df = pd.DataFrame(tup,columns=['src','src_port','dst','dst_port'])

For traffic from the same flow (inbound/outbound), flow id should be set like:

src src_port    dst dst_port    flow_id
0   192.168.0.1 1032    192.168.0.2 443 1
1   192.168.0.1 1032    192.168.0.2 443 1
2   192.168.0.1 1034    192.168.0.2 443 2
3   192.168.0.2 443 192.168.0.1 1034    2
4   192.168.0.1 1034    192.168.0.2 443 2
5   192.168.0.1 1034    192.168.0.2 443 2
6   192.168.0.2 443 192.168.0.1 1034    2
7   192.168.0.2 443 192.168.0.1 1034    2
8   192.168.0.1 1032    192.168.0.2 443 1
9   192.168.0.2 443 192.168.0.1 1032    1

I converted dataframe to values and sorted them together, but stuck at setting correct flow index.

Is there any faster/elegant way?


Solution

  • One idea is sorted in pairs - nested tuples and then call factorize:

    a = df[['src','src_port','dst','dst_port']].to_numpy()
    s = [tuple(sorted(((x[0], x[1]), (x[2], x[3])))) for x in a]
    df['flow_id'] = pd.factorize(s)[0] + 1
    
    print (df)
               src src_port          dst dst_port  flow_id
    0  192.168.0.1     1032  192.168.0.2      443        1
    1  192.168.0.1     1032  192.168.0.2      443        1
    2  192.168.0.1     1034  192.168.0.2      443        2
    3  192.168.0.2      443  192.168.0.1     1034        2
    4  192.168.0.1     1034  192.168.0.2      443        2
    5  192.168.0.1     1034  192.168.0.2      443        2
    6  192.168.0.2      443  192.168.0.1     1034        2
    7  192.168.0.2      443  192.168.0.1     1034        2
    8  192.168.0.1     1032  192.168.0.2      443        1
    9  192.168.0.2      443  192.168.0.1     1032        1