pythonpandasdata-wrangling

Pandas idxmax - top n values


I have this code:

import pandas as pd
df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48], 'co2_emissions': [37.2, 19.66, 1712]}, index=['Pork', 'Wheat Products', 'Beef'])
df['Max'] = df.idxmax(axis=1, skipna=True, numeric_only=True)
df

I need to find the n largest values. Here there is a technique using apply/lambda. But it returns error.

df.apply(lambda s: s.abs().nlargest(2).index.tolist(), axis=1,skipna=True, numeric_only=True)

TypeError: () got an unexpected keyword argument 'numeric_only'

Is there any way to obtain top N results using idxmax? Is there any way to overcome this error got when using apply lambda method?


Solution

  • Your error is due to passing the skipna and numeric_only parameters to apply.

    You can fix it with:

    (df.select_dtypes('number')
       .apply(lambda s: s.dropna().abs().nlargest(2)
                         .index.tolist(), axis=1)
     )
    

    Output:

    Pork              [co2_emissions, consumption]
    Wheat Products    [consumption, co2_emissions]
    Beef              [co2_emissions, consumption]
    dtype: object
    

    A more efficient approach using

    N = 2
    
    tmp = df.select_dtypes('number')
    
    out = pd.Series(
        np.take_along_axis(
            tmp.columns.to_numpy()[:, None],
            np.argpartition(tmp, -N)[:, -N:],
            axis=0
        )[:, ::-1].tolist(),
        index=df.index,
    )