pythonpandasdataframeconditional-statementsswitch-statement

Pandas - Case when & default in pandas


I have the below case statement in python,

pd_df['difficulty'] = 'Unknown'
pd_df['difficulty'][(pd_df['Time']<30) & (pd_df['Time']>0)] = 'Easy'
pd_df['difficulty'][(pd_df['Time']>=30) & (pd_df['Time']<=60)] = 'Medium'
pd_df['difficulty'][pd_df['Time']>60] = 'Hard'

But when I run the code, it throws an error.

A value is trying to be set on a copy of a slice from a DataFrame

Solution

  • Option 1
    For performance, use a nested np.where condition. For the condition, you can just use pd.Series.between, and the default value will be inserted accordingly.

    pd_df['difficulty'] = np.where(
         pd_df['Time'].between(0, 30, inclusive=False), 
        'Easy', 
         np.where(
            pd_df['Time'].between(0, 30, inclusive=False), 'Medium', 'Unknown'
         )
    )
    

    Option 2
    Similarly, using np.select, this gives more room for adding conditions:

    pd_df['difficulty'] = np.select(
        [
            pd_df['Time'].between(0, 30, inclusive=False), 
            pd_df['Time'].between(30, 60, inclusive=True)
        ], 
        [
            'Easy', 
            'Medium'
        ], 
        default='Unknown'
    )
    

    Option 3
    Another performant solution involves loc:

    pd_df['difficulty'] = 'Unknown'
    pd_df.loc[pd_df['Time'].between(0, 30, inclusive=False), 'difficulty'] = 'Easy'
    pd_df.loc[pd_df['Time'].between(30, 60, inclusive=True), 'difficulty'] = 'Medium'