I need to use a lambda function to do a row by row computation. For example create some dataframe
import pandas as pd
import numpy as np
def myfunc(x, y):
return x + y
colNames = ['A', 'B']
data = np.array([np.arange(10)]*2).T
df = pd.DataFrame(data, index=range(0, 10), columns=colNames)
using 'myfunc' this does work
df['D'] = (df.apply(lambda x: myfunc(x.A, x.B), axis=1))
but this second case does not work!
df['D'] = (df.apply(lambda x: myfunc(x.colNames[0], x.colNames[1]), axis=1))
giving the error
AttributeError: ("'Series' object has no attribute 'colNames'", u'occurred at index 0')
I really need to use the second case (access the colNames using the list) which gives an error, any clues on how to do this?
When you use df.apply()
, each row of your DataFrame will be passed to your lambda function as a pandas Series. The frame's columns will then be the index of the series and you can access values using series[label]
.
So this should work:
df['D'] = (df.apply(lambda x: myfunc(x[colNames[0]], x[colNames[1]]), axis=1))