pythonpandasdataframenumpypopulate

I want to populate the column of a dataframe with values from the column of another dataframe when the values of two columns match


I want to populate values of df2['VALUE'] in a new column in df1 df1['New'] when category and location match for both dataframes.

When

df1['category'] = df2['CATEGORY'] AND df1['location'] = df2['LOCATION']

populate values of df2['VALUE'] into a new row in df1 so that using df1['v1'] I can calculate df1['calculatedfield'] = df1['v1']/df1['new']

df1

category location type v1
A loc1 1 2
A loc1 2 4
A loc2 1 6
A loc2 2 8
B loc1 1 10
B loc1 2 12
B loc2 1 14
B loc2 2 16

df2

CATEGORY LOCATION VALUE
A loc1 50
A loc2 30
B loc1 70
B loc2 90

output

category location type v1 new(df2['VALUE']) calculatedfield(v1/new)
A loc1 1 2 50 0.04
A loc1 2 4 50 0.08
A loc2 1 6 30 0.2
A loc2 2 8 30 0.27
B loc1 1 10 70 0.14
B loc1 2 12 70 0.17
B loc2 1 14 90 0.16
B loc2 2 16 90 0.18

df1 has more rows than df2, that is why I didn't go with joining the two dataframes. I need to populate df2['new'] with the values from df2['VALUE'] wherever the combination of category and location come up and regardless of the value of df2['type']. I can't drop rows.

I tried

df1['New'] = np.where((df1['category'] == df2['CATEGORY']) & (df1['location'] == df2['location']), df2['VALUE'], None)

and this came up:

ValueError: Can only compare identically-labeled Series objects


Solution

  • You can use pandas.set_index() function to create a multiIndex dataframe for obtaining values from df2 and use the pandas.reset.index() to reset the dataframe. Please see below my sample code. I have replicated your dataframes df1 and df2.

    code:

    import pandas as pd
    
    data1 = {'category': ['A', 'A', 'A', 'A','B', 'B', 'B', 'B'], 
            'location': ['loc1', 'loc1', 'loc2', 'loc2','loc1', 'loc1', 'loc2', 'loc2'],
            'type': [1,2,1,2,1,2,1,2,],
             'v1': [2,4,6,8,10,12,14,16]}
    data2 = {'category': ['A', 'A','B', 'B'], 
            'location': ['loc1', 'loc2', 'loc1','loc2'],
            'VALUE': [50,30,70,90]}    
    df1 = pd.DataFrame(data1)
    df2 = pd.DataFrame(data2)
    
    df_temp = df1.set_index(['category','location'])
    df_temp['New'] = df2.set_index(['category','location'])['VALUE']
    df1 = df_temp.reset_index()
    
    df1['Calculated_Field'] = df1['v1']/df1['New']
    df1
    

    Output:

    enter image description here

    References:

    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html

    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html#pandas.DataFrame.reset_index