pythonpython-3.xpandasdataframepandas-merge

Update values in a DataFrame by values in another by date


I tried to make a code to inputting version of items. There are two dataframes, one with the items, one with the version info.

Here's a dummy data and a code I made:

import pandas as pd
from datetime import datetime

Item = pd.DataFrame({"ID":["A1","A1","A2","A2","A3","B1"],"DATE":["2021-07-05","2021-08-01","2021-02-02","2021-02-03","2021-01-01","2021-10-12"]})
Ver = pd.DataFrame({"ver_date" : ["2021-01-01","2021-07-07","2021-09-09"],"version":["1.1","1.2","1.3"]})

for id,dat in zip(Item["ID"],Item["DATE"]):
    dat2 = datetime.strptime(dat,'%Y-%m-%d')
    for dtc,ver in zip(Ver["ver_date"],Ver["version"]):
        dtc = datetime.strptime(dtc,'%Y-%m-%d')
        if dat2 >= dtc:
            Item.loc[(Item.ID == id) & (Item.DATE == dat),"VER"] = ver
            continue
        else:
            pass

So what I tried is this code. Actually this code works, but I think it is not efficient since it gives me a result in about 1~2 min. (for 200 items).


Solution

  • You could use merge_asof. It merges by key distance. By default, it uses a "backward" search, which selects the last row in the Ver whose "ver_date" is less than or equal to the "Date" of Item (which is what your if-else condition is checking for).

    Also it expects the keys to be merged on to be sorted; so you could first convert the date columns to datetime objects and sort the DataFrames by them:

    Item['DATE'] = pd.to_datetime(Item['DATE'])
    Ver['ver_date'] = pd.to_datetime(Ver['ver_date'])
    out = (pd.merge_asof(Item.sort_values(by='DATE'), 
                         Ver.sort_values(by='ver_date'), 
                         left_on='DATE', right_on='ver_date')
           .drop(columns='ver_date')
           .sort_values(by='ID')
           .rename(columns={'version':'VER'}))
    

    Output:

       ID       DATE  VER
    3  A1 2021-07-05  1.1
    4  A1 2021-08-01  1.2
    1  A2 2021-02-02  1.1
    2  A2 2021-02-03  1.1
    0  A3 2021-01-01  1.1
    5  B1 2021-10-12  1.3