pandas

Add column in one dataframe from looking up another one


I am struggling with a good way for a solution to this problems.

I am having a dataframe 'orders', which has a time and price column. And I have another dataframe 'prices' which give me the markt price at any time.

import pandas as pd
prices = pd.DataFrame({'time': range(10), 'price': [20, 21, 22, 21, 22, 23, 24, 25, 26, 27]})
orders = pd.DataFrame({'time': [3, 6, 8], 'orderPrice' : [20, 24, 18]})

And I want to add a column to 'order', let's call it 'marketPriceTime', which does:

So the result in this case would be:

ordersResult = pd.DataFrame({'time': [3, 6, 8], 'orderPrice' : [20, 24, 18], 'marketPriceTime': [0, 6, None]})

Solution

  • What you want is a merge_asof on time with price as ID:

    out = pd.merge_asof(orders, prices.rename(columns={'time': 'marketPriceTime'}),
                        left_on='time', right_on='marketPriceTime',
                        left_by='orderPrice', right_by='price')
    
    

    Now the subtlety is that both dataframes must be sorted by time for the merge_asof to work. If this is not the case, you must use:

    out = pd.merge_asof(orders.reset_index().sort_values(by='time'),
                        prices.sort_values(by='time').rename(columns={'time': 'marketPriceTime'}),
                        left_by='orderPrice', right_by='price',
                        left_on='time', right_on='marketPriceTime'
                       ).set_index('index').reindex(orders.index)
    

    Output:

       time  orderPrice  marketPriceTime  price
    0     3          20              0.0   20.0
    1     6          24              6.0   24.0
    2     8          18              NaN    NaN