I am new to Python and need to lookup data from df1 and add a column to df2 based on a column value and loop through all records in df2 until column "codekey1" is populated with the related data from df1.
I have seen this post and trying to replicate it: Iterating over dataframe rows with for loop - problem with .loc
df1 = pd.DataFrame({
"invoice":["Z1234","Z1234","Z10444","Z10444","Z56789","Z56789","Z34555","Z34555"],
"amount":["10.10","10.10","11000.23","11000.23","567.34","567.34","4567.55","4567.55"],
"codekey":["100","200","100","200","100","200","100","200"],
"codevalue":["CO123","PO123","CO345","PO345","CO567","PO567","CO789","PO789"]
})
output df1:
df1
invoice amount codekey codevalue
0 Z1234 10.10 100 CO123
1 Z1234 10.10 200 PO123
2 Z10444 11000.23 100 CO345
3 Z10444 11000.23 200 PO345
4 Z56789 567.34 100 CO567
5 Z56789 567.34 200 PO567
6 Z34555 4567.55 100 CO789
7 Z34555 4567.55 200 PO789`
df2 = pd.DataFrame({
"invoice":["Z1234","Z10444","Z56789","Z34555"],
"amount":["10.10","11000.23","567.34","4567.55"],
"codekey1":["","","",""],
"codekey2":["","","",""]
})
required output:
`df2`
invoice amount codekey1 codekey2
0 Z1234 10.10 C0123
1 Z10444 11000.23 CO345
2 Z56789 567.34 CO567
3 Z34555 4567.55 CO789
I run this code:
for i in range(len(df2)):
codekey_lookup = (df1.loc[(df1["invoice"] == df2.loc[i, "invoice"] & df2["codekey"] == "100"), "codevalue"]).to_list()[0]
df2.loc[i, "codekey1"] = codekey_lookup
and get this error:
KeyError: 'codekey'
Hope someone can help..
The error says it, df2
doesn't have a codekey
column. so in your code df2["codekey"] == "100"
throws that error. Also, you need to wrap the equal op in parenthesis like this
for i in range(len(df2)):
codekey_lookup = ((df1.loc[(df1["invoice"] == df2.loc[i, "invoice"])
& (df1["codekey"] == "100"), "codevalue"])
).to_list()[0]
df2.loc[i, "codekey1"] = codekey_lookup
On another note, you might want to take a look at merge
or map
operation rather than looping with loc
. For example, the following does the same thing, just a lot faster:
code_by_invoice = df1.loc[df1['codekey']=='100', ['invoice','codevalue']].set_index('invoice')
df2['codekey1'] = df2['invoice'].map(code_by_invoice['codevalue'])
Output:
invoice amount codekey1 codekey2
0 Z1234 10.10 CO123
1 Z10444 11000.23 CO345
2 Z56789 567.34 CO567
3 Z34555 4567.55 CO789