pythonpandasdataframeindexing

complicated pandas operations involving 2 dataframes


I have a dataframe df_a that has a column called dr7objid and has 194478 rows. There's a 2nd dataframe df_indices with 245609 rows that also has an objid column as well as an asset_idcolumn. Both are pictured below.

df_a

enter image description here

I want to add a new column to df_a that finds the matching asset_id in df_indices for the specific dr7objid in df_a. For example, the first value in the column in df_a should be:

df_indices[df_indices['objid'] == 587732484359913515].index[0]+1

which is55934 (note that asset_id is equal to the index+1 in general, though one could also use the at function with 'asset_id' as the label).

How do I generalize this to all examples without using for loops? I've tried thinking of boolean indexing and stuff but I can't figure out how to search for the corresponding objid in df_indices efficiently.

The situation is also more complicated because confusingly, not all dr7objid's are in df_indices, even though df_indices is bigger, so if that's the case, the ID should be nan or 0.


Solution

  • One of the other way to get the desired operation is through merge as shown in example below:

    df_a = df_a.merge(df_indices[['objid','asset_id']],left_on='dr7objid', right_on='objid', how='left', copy=False).drop(columns={'objid'})