pythonpandasdataframedata-munging

Perform some operation if 2 pandas dataframe have same entries in python


I have 2 dataframes (purchase and sales) as follows:

PURCHASE:

Name item voucher Amt Qty
A Item1 Purchase 10000 100
B Item2 Purchase 500 50
B Item1 Purchase 2000 20
C Item3 Purchase 1000 100
D Item4 Purchase 500 100
A Item3 Purchase 5000 50

SALES:

Name item voucher Amt Qty
A Item1 Sales 5300 50
B Item2 Sales 450 40
B Item1 Sales 1675 15
C Item3 Sales 1800 100

I want an output dataframe where if the person (Name) sells an item, the Amt and Qty should be deducted from the purchase dataframe and a new dataframe should be created with the remaining Amt and Qty as shown below:

OUTPUT DATAFRAME:

Name item voucher Amt Qty
A Item1 Remaining 4700 50
A Item3 Remaining 5000 50
B Item2 Remaining 50 10
B Item1 Remaining 325 5
C Item3 Remaining -800 0
D Item4 Remaining 500 100

Notice that whatever items have been sold by a person (Name) has been deducted from the purchase dataframe and the remaining items (Amt and Qty) are stored in a new output dataframe. Also person D never sold any items even then it should be included in the output dataframe.

Thanks in advance!

Dataframe

import pandas as pd

Purchases = {
    "Name": ["A", "B", "B", "C", "D", "A"],
    "item": ["Item1", "Item2", "Item1", "Item3", "Item4", "Item3"],
    "voucher": ["Purchase", "Purchase", "Purchase", "Purchase", "Purchase", "Purchase"],
    "Amt": [10000, 500, 2000, 1000, 500, 5000],
    "Qty": [100, 50, 20, 100, 100, 50],
}

Purchases = pd.DataFrame(Purchases)

Sales = {
    "Name": ["A", "B", "B", "C"],
    "item": ["Item1", "Item2", "Item1", "Item3"],
    "voucher": ["Sales", "Sales", "Sales", "Sales"],
    "Amt": [5300, 450, 1675, 1800],
    "Qty": [50, 40, 15, 100],
}

Sales = pd.DataFrame(Sales)

Solution

  • Using good old index alignment:

    tmp = Purchases.set_index(['Name', 'item'])
    out = (tmp
           .sub(Sales.set_index(['Name', 'item'])[['Amt', 'Qty']])
           .combine_first(tmp).assign(voucher='Remaining')
           .reset_index()[Purchases.columns]
          )
    

    Output:

      Name   item    voucher     Amt    Qty
    0    A  Item1  Remaining  4700.0   50.0
    1    A  Item3  Remaining  5000.0   50.0
    2    B  Item1  Remaining   325.0    5.0
    3    B  Item2  Remaining    50.0   10.0
    4    C  Item3  Remaining  -800.0    0.0
    5    D  Item4  Remaining   500.0  100.0