I have the following dataframe where I show how many times I have seen a move from Item1 to Item 2. For example there is one transition from A to B, 2 from A to C , 1 from C to A
Item1 Item2 Moves
1 A B 1
2 A C 2
3 B D 3
4 C A 1
5 C B 5
6 D B 4
7 D C 1
I would like to calculate the difference between two items, so a newly constructed Dataframe would be the following
Item1 Item2 Moves
1 A B 1
2 A C 1
3 B D -1
4 C B 5
5 D C 1
Does anyone have any idea how to do that using Pandas? I guess i need to index on the first two columns but I quite new in Pandas and i face a lot of difficulties. Thanks
EDIT There can't be any duplicate pairs.For example you cant see twice a->b (but you can of course see b->a)
I'm sure someone could simplify this down to fewer lines, but I've left it long to help clarify what is going on. In a nutshell, split the dataframe into two pieces based on whether 'Item1' is earlier in the alphabet than 'Item2'. Then flip 'Item1' and 'Item2' and negate 'Moves' for one piece. Glue them back together and use the groupby
function to aggregate the rows.
>>> df
Item1 Item2 Moves
0 A B 1
1 A C 2
2 B D 3
3 C A 1
4 C B 5
5 D B 4
6 D C 1
>>> swapidx = df['Item1'] < df['Item2']
>>> df1 = df[swapidx]
>>> df2 = df[swapidx^True]
>>> df1
Item1 Item2 Moves
0 A B 1
1 A C 2
2 B D 3
>>> df2
Item1 Item2 Moves
3 C A 1
4 C B 5
5 D B 4
6 D C 1
>>> df2[['Item1', 'Item2']] = df2[['Item2', 'Item1']]
>>> df2['Moves'] = df2['Moves']*-1
>>> df2
Item1 Item2 Moves
3 A C -1
4 B C -5
5 B D -4
6 C D -1
>>> df3 = df1.append(df2)
>>> df3.groupby(['Item1', 'Item2'], as_index=False).sum()
Item1 Item2 Moves
0 A B 1
1 A C 1
2 B C -5
3 B D -1
4 C D -1