pythonpandasdataframemutatejanitor

Using Janitor based on other columns in the same row to output conditional results


What I'm trying to do is have if the row from ColX is in the row ColZ I want a new column to be ColZ if not its colA.

I kept looking around but couldn't find a solution.

My data is a bit more in depth but this example should do it.

Perhaps there is a way out of janitor to do it which I am open to.

Edit:

I put in the wrong example code. Totally my fault. Updating it now.

df  = pd.DataFrame(
    {
'colZ' :["zang", "zang", "zang", "z", "zang"],
'colX' :["A", "B", "B", "A", "Z"],
'colA' :["1", "1", "1", "1", "1"],
    }
)


# Desired Output:

output_df = pd.DataFrame(
    {
'colZ' :["zang", "zang", "zang", "z", "zang"],
'colX' :["A", "B", "B", "A", "Z"],
'colA' :["1", "1", "1", "1", "1"],
'result' :["zang", "1", "1", "1", "zang"]
    }
)

Here is what I have tried.

output_df = jn.case_when(df,

                  df['colZ'].str.contains(df['colX']),  df['colZ'],
                  df['colA'],

                  column_name='result')

# Also tried this and many others

output_df = jn.case_when(df,

                  df['colZ'].isin(df['colX']),  df['colZ'],
                  df['colA'],

                  column_name='result')



Solution

  • We can break it into 2 parts:

    1. Create variables for each columns:
    col_a = df['colA']
    col_x = df['colX']
    col_z = df['colZ']
    
    1. Iterate over the rows and check if the word in the ColX is included in the colZ:
    df['result'] = [col_z[col_index] if col_x[col_index].upper() in col_z[col_index].upper() else col_a[col_index] for col_index in range(df.shape[0])]
    

    Or you can do a bigger one-liner by not initializing new variables on step 1, but it's getting too heavy...
    Hope it helps!

    Another way would be to zip:

    df['result'] = [colz 
                    if colx.lower() in colz 
                    else cola 
                    for colz, colx, cola 
                    in zip(df.colZ, df.colX, df.colA)]