I would like to apply a func to a column of pandas DataFrame. Such func takes one string and one column of the DF.
As follows:
def check_it(language,text):
print(language)
if language == 'EN':
result = 'DNA' in text
else:
result ='NO'
return result
df = pd.DataFrame({'ID':['1','2','3'], 'col_1': ['DNA','sdgasdf','sdfsdf'], 'col_2':['sdfsf sdf s','DNA','sdgasdf']})
df['col_3']=df['col_2'].apply(check_it, args=('EN',))
df
This does not produce the results required because enven if 'EN' is passed as argument at first place when printing 'language' inside the func the result is the element o the column.
In the pandas documentation here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.apply.html the example is not 100% clear:
def subtract_custom_value(x, custom_value):
return x - custom_value
s.apply(subtract_custom_value, args=(5,))
It looks like the first variable of the func has to be the series. If the functions are already given and changing the order of variables is not possible, how should I proceed? What if the func takes multiples variables and only the third one out of 6 is the series of the dataframe?
The following would work but it is not a valid option:
def check_it(text,language):
...
df['col_3']=df['col_2'].apply(check_SECA, args=('EN',))
since I can not change the order of the variables in the func.
You can always create a lambda, and in the body, invoke your function as needed:
df['col_3']=df['col_2'].apply(lambda text: check_it('EN', text))
df
ID col_1 col_2 col_3
0 1 DNA sdfsf sdf s False
1 2 sdgasdf DNA True
2 3 sdfsdf sdgasdf False