pythonpandasdataframenumpy

How do I create a new column where the values are selected based on an existing column?


How do I add a color column to the following dataframe so that color='green' if Set == 'Z', and color='red' otherwise?

   Type  Set
1     A    Z
2     B    Z           
3     B    X
4     C    Y

Solution

  • If you only have two choices to select from then use np.where:

    df['color'] = np.where(df['Set']=='Z', 'green', 'red')
    

    For example,

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
    df['color'] = np.where(df['Set']=='Z', 'green', 'red')
    print(df)
    

    yields

      Set Type  color
    0   Z    A  green
    1   Z    B  green
    2   X    B    red
    3   Y    C    red
    

    If you have more than two conditions then use np.select. For example, if you want color to be

    then use

    df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
    conditions = [
        (df['Set'] == 'Z') & (df['Type'] == 'A'),
        (df['Set'] == 'Z') & (df['Type'] == 'B'),
        (df['Type'] == 'B')]
    choices = ['yellow', 'blue', 'purple']
    df['color'] = np.select(conditions, choices, default='black')
    print(df)
    

    which yields

      Set Type   color
    0   Z    A  yellow
    1   Z    B    blue
    2   X    B  purple
    3   Y    C   black