pandasdataframe

trying to iterate over 1 dataframe and get values from another using .loc


I am new to Python and need to lookup data from df1 and add a column to df2 based on a column value and loop through all records in df2 until column "codekey1" is populated with the related data from df1.

I have seen this post and trying to replicate it: Iterating over dataframe rows with for loop - problem with .loc

df1 = pd.DataFrame({
    "invoice":["Z1234","Z1234","Z10444","Z10444","Z56789","Z56789","Z34555","Z34555"],
    "amount":["10.10","10.10","11000.23","11000.23","567.34","567.34","4567.55","4567.55"],
    "codekey":["100","200","100","200","100","200","100","200"],
    "codevalue":["CO123","PO123","CO345","PO345","CO567","PO567","CO789","PO789"]
})

output df1:

df1
  invoice    amount codekey codevalue
0   Z1234     10.10     100     CO123
1   Z1234     10.10     200     PO123
2  Z10444  11000.23     100     CO345
3  Z10444  11000.23     200     PO345
4  Z56789    567.34     100     CO567
5  Z56789    567.34     200     PO567
6  Z34555   4567.55     100     CO789
7  Z34555   4567.55     200     PO789`
df2 = pd.DataFrame({
    "invoice":["Z1234","Z10444","Z56789","Z34555"],
    "amount":["10.10","11000.23","567.34","4567.55"],
    "codekey1":["","","",""],
    "codekey2":["","","",""]
})

required output:

`df2`
  invoice    amount codekey1 codekey2
0   Z1234     10.10    C0123
1  Z10444  11000.23    CO345
2  Z56789    567.34    CO567
3  Z34555   4567.55    CO789

I run this code:

for i in range(len(df2)):
    codekey_lookup = (df1.loc[(df1["invoice"] == df2.loc[i, "invoice"] & df2["codekey"] == "100"), "codevalue"]).to_list()[0]
    df2.loc[i, "codekey1"] = codekey_lookup

and get this error:

KeyError: 'codekey'

Hope someone can help..


Solution

  • The error says it, df2 doesn't have a codekey column. so in your code df2["codekey"] == "100" throws that error. Also, you need to wrap the equal op in parenthesis like this

    for i in range(len(df2)):
        codekey_lookup = ((df1.loc[(df1["invoice"] == df2.loc[i, "invoice"]) 
                          & (df1["codekey"] == "100"), "codevalue"])
                         ).to_list()[0]
        df2.loc[i, "codekey1"] = codekey_lookup
    

    On another note, you might want to take a look at merge or map operation rather than looping with loc. For example, the following does the same thing, just a lot faster:

    code_by_invoice = df1.loc[df1['codekey']=='100', ['invoice','codevalue']].set_index('invoice')
    
    df2['codekey1'] = df2['invoice'].map(code_by_invoice['codevalue'])
    

    Output:

      invoice    amount codekey1 codekey2
    0   Z1234     10.10    CO123         
    1  Z10444  11000.23    CO345         
    2  Z56789    567.34    CO567         
    3  Z34555   4567.55    CO789