pythonpandas

How to make derivative variables according to the sequence of data in pandas


I have a dataframe:

df =

No. Scenario Exe Seq Action
1 A 1 a
2 A 2 b
3 A 3 c
4 A 1 a
5 A 2 b
6 A 1 a

Those are same scenarios, but some reach three, but some stop at two or one. I want to distinguish this.

The "Scenario" values may have values other than "A"

So I will get:

No. Scenario Exe Seq Action New_Scenario
1 A 1 a A_1
2 A 2 b A_1
3 A 3 c A_1
4 A 1 a A_2
5 A 2 b A_2
6 A 1 a A_3

Solution

  • IIUC use:

    #sequence start if consecutive differencies if not 1
    df['New_Scenario'] = df['Scenario'] + '_' + df['Exe Seq'].diff().ne(1).cumsum().astype(str)
    print (df)
    

    Or:

    #sequence start by 1
    df['New_Scenario'] = df['Scenario'] + '_' + df['Exe Seq'].eq(1).cumsum().astype(str)
    

    Or maybe:

    #sequence start if consecutive differencies if less like 0
    df['New_Scenario'] = (df['Scenario'] + '_' + 
                          df['Exe Seq'].diff().fillna(-1).le(0).cumsum().astype(str))