pythonpandasdataframedata-analysis

case_when function from R to Python


How I can implement the case_when function of R in a python code?

Here is the case_when function of R:

https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/case_when

as a minimum working example suppose we have the following dataframe (python code follows):

import pandas as pd
import numpy as np

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'age': [42, 52, 36, 24, 73], 
        'preTestScore': [4, 24, 31, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(data, columns = ['name', 'age', 'preTestScore', 'postTestScore'])
df

Suppose than we want to create an new column called 'elderly' that looks at the 'age' column and does the following:

if age < 10 then baby
 if age >= 10 and age < 20 then kid 
if age >=20 and age < 30 then young 
if age >= 30 and age < 50 then mature 
if age >= 50 then grandpa 

Can someone help on this ?


Solution

  • You want to use np.select:

    conditions = [
        (df["age"].lt(10)),
        (df["age"].ge(10) & df["age"].lt(20)),
        (df["age"].ge(20) & df["age"].lt(30)),
        (df["age"].ge(30) & df["age"].lt(50)),
        (df["age"].ge(50)),
    ]
    choices = ["baby", "kid", "young", "mature", "grandpa"]
    
    df["elderly"] = np.select(conditions, choices)
    
    # Results in:
    #      name  age  preTestScore  postTestScore  elderly
    #  0  Jason   42             4             25   mature
    #  1  Molly   52            24             94  grandpa
    #  2   Tina   36            31             57   mature
    #  3   Jake   24             2             62    young
    #  4    Amy   73             3             70  grandpa
    

    The conditions and choices lists must be the same length.
    There is also a default parameter that is used when all conditions evaluate to False.