pythonpython-3.xpandasdataframe

Pandas DataFrame Cannot use assign function - Why?


I am encountering some odd behavior in pandas, and I am hoping someone could shed some light on specifics from the df.assign(...) function in a pandas dataframe. I am getting a ValueError when trying to assign to column, despite the function being valid.

def is_toc_row(row):
    m_sig = m_df.loc[m_df.signature == row.signature]
    pct = (~pd.isnull(m_sig.line_type)).sum() / m_sig.shape[0]
    return (not pd.isnull(row.line_type)) or (pct < .5)


m_df = m_df.assign(is_toc_row=is_toc_row)

Gives:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

But this works totally fine:

for ind, row in m_df.iterrows():
    m_df.at[ind, 'is_toc_row'] = is_toc_row(row)

Is there some issue with referencing the rest of the DataFrame in the function? All I see in the docs is that the subject df cannot change, which it does not.

Of course I am capable of building a workaround, I just want to understand why this does not work for future use.

EDIT:

Not totally sure why so many down votes but adding a few rows of data here anyways per requests

index signature line_type
0 WYcxXTjq27YAP4uJOcLeRLelyUixNJaOwFwf2qqfpM4 NaN
1 WYcxXTjq27YAP4uJOcLeRLelyUixNJaOwFwf2qqfpM4 NaN
2 WYcxXTjq27YAP4uJOcLeRLelyUixNJaOwFwf2qqfpM4 1
3 WYcxXTjq27YAP4uJOcLeRLelyUixNJaOwFwf2qqfpM4 2
4 WYcxXTjq27YAP4uJOcLeRLelyUixNJaOwFwf2qqfpM4 2.4

Solution

  • Actually when assign is used with a custom function, the function doesn't receive the datafame row by row (like apply) but receives once the full dataframe. Let's take a toy example:

    m_df = pd.DataFrame({'temp_b': [7.0, 5.0], 'temp_c': [17.0, 25.0]},
                         index=['Portland', 'Berkeley'])
    
    def myfunc(x):
        print(x, "*end*")
        return  x.temp_c + x.temp_b
    
    m_df = m_df.assign(is_toc_row=myfunc)
    display(m_df)
    

    res