pythonpandasprimary-keycalculated-columnsdata-handling

How to create an ID that increases by 1 every time the previous row of another column is 1


Working with Python, I need to create two new variables.

One (See JourneyID in example) that cummulatively increases by one each time the previous row of another column takes the value '1', and

One (See JourneyN in example) that cummulatively increases by one each time the previous row of another column takes the value '1', but starts over from 1 every time the Respondent ID increases by 1.

m = df['Purpose'] == 1
df.loc[m, 'JourneyID'] = m.cumsum()

Returns df[JourneyID] = [1,1,1,2,1,1,3,1,4] when it should return [1,1,2,2,3,3,3,4,4] for ID.

Any help is greatly appreciated.

Example of what I need to do


Solution

  • Its not super clean, but should get you what you need:

    helper = ((df['Purpose']==1).cumsum()+1).shift(1)
    helper[0]=1
    df['JourneyID'] =  helper
    

    JourneyN I did not fully understand :)