pythonpandasapply

How to apply a function with several variables to a column of a pandas dataframe (when it is not possible to change the order of vars in func)


I would like to apply a func to a column of pandas DataFrame. Such func takes one string and one column of the DF.

As follows:

def check_it(language,text):
    print(language)
    if language == 'EN':
        result = 'DNA' in text
    else:
        result ='NO'
    return result
df = pd.DataFrame({'ID':['1','2','3'], 'col_1': ['DNA','sdgasdf','sdfsdf'], 'col_2':['sdfsf sdf s','DNA','sdgasdf']})

df['col_3']=df['col_2'].apply(check_it, args=('EN',))
df

This does not produce the results required because enven if 'EN' is passed as argument at first place when printing 'language' inside the func the result is the element o the column.

In the pandas documentation here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.apply.html the example is not 100% clear:

def subtract_custom_value(x, custom_value):
    return x - custom_value
s.apply(subtract_custom_value, args=(5,))

It looks like the first variable of the func has to be the series. If the functions are already given and changing the order of variables is not possible, how should I proceed? What if the func takes multiples variables and only the third one out of 6 is the series of the dataframe?

Note

The following would work but it is not a valid option:

def check_it(text,language):
...
df['col_3']=df['col_2'].apply(check_SECA, args=('EN',))

since I can not change the order of the variables in the func.


Solution

  • You can always create a lambda, and in the body, invoke your function as needed:

    df['col_3']=df['col_2'].apply(lambda text: check_it('EN', text))
    df
    
      ID    col_1        col_2  col_3
    0  1      DNA  sdfsf sdf s  False
    1  2  sdgasdf          DNA   True
    2  3   sdfsdf      sdgasdf  False