pythonindexingdataframepandas

Calculate pairwise difference from specific columns in a dataframe


I have the following dataframe where I show how many times I have seen a move from Item1 to Item 2. For example there is one transition from A to B, 2 from A to C , 1 from C to A


    Item1   Item2   Moves
  1  A       B       1
  2  A       C       2
  3  B       D       3
  4  C       A       1
  5  C       B       5
  6  D       B       4
  7  D       C       1

I would like to calculate the difference between two items, so a newly constructed Dataframe would be the following

    Item1   Item2   Moves
  1  A       B       1
  2  A       C       1
  3  B       D      -1
  4  C       B       5
  5  D       C       1

Does anyone have any idea how to do that using Pandas? I guess i need to index on the first two columns but I quite new in Pandas and i face a lot of difficulties. Thanks

EDIT There can't be any duplicate pairs.For example you cant see twice a->b (but you can of course see b->a)


Solution

  • I'm sure someone could simplify this down to fewer lines, but I've left it long to help clarify what is going on. In a nutshell, split the dataframe into two pieces based on whether 'Item1' is earlier in the alphabet than 'Item2'. Then flip 'Item1' and 'Item2' and negate 'Moves' for one piece. Glue them back together and use the groupby function to aggregate the rows.

    >>> df
      Item1 Item2  Moves
    0     A     B      1
    1     A     C      2
    2     B     D      3
    3     C     A      1
    4     C     B      5
    5     D     B      4
    6     D     C      1
    >>> swapidx = df['Item1'] < df['Item2']
    >>> df1 = df[swapidx]
    >>> df2 = df[swapidx^True]
    >>> df1
      Item1 Item2  Moves
    0     A     B      1
    1     A     C      2
    2     B     D      3
    >>> df2
      Item1 Item2  Moves
    3     C     A      1
    4     C     B      5
    5     D     B      4
    6     D     C      1
    >>> df2[['Item1', 'Item2']] = df2[['Item2', 'Item1']]
    >>> df2['Moves'] = df2['Moves']*-1
    >>> df2
      Item1 Item2  Moves
    3     A     C     -1
    4     B     C     -5
    5     B     D     -4
    6     C     D     -1
    >>> df3 = df1.append(df2)
    >>> df3.groupby(['Item1', 'Item2'], as_index=False).sum()
      Item1 Item2  Moves
    0     A     B      1
    1     A     C      1
    2     B     C     -5
    3     B     D     -1
    4     C     D     -1