pythonpandasgroup-byrunning-count

Add a sequence number to each element in a group using python


I have a dataframe of individuals who each have multiple records. I want to enumerate the record in the sequence for each individual in python. Essentially I would like to create the 'sequence' column in the following table:

patient  date      sequence
145      20Jun2009        1
145      24Jun2009        2
145      15Jul2009        3
582      09Feb2008        1
582      21Feb2008        2
987      14Mar2010        1
987      02May2010        2
987      12May2010        3

This is essentially the same question as here, but I am working in python and unable to implement the sql solution. I suspect I can use a groupby statement with an iterable count, but have so far been unsuccessful.


Solution

  • The question is how do I sort on multiple columns of data.

    One simple trick is to use the key parameter to the sorted function.

    You'll be sorting by a string built from the columns of the array.

    rows = ...# your source data
    
    def date_to_sortable_string(date):
      # use datetime package to convert string to sortable date.
      pass
    
    # Assume x[0] === patient_id and x[1] === encounter date
    
    # Sort by patient_id and date
    rows_sorted = sorted(rows, key=lambda x: "%0.5d-%s" % (x[0], date_to_sortable_string(x[1])))
    
    for row in rows_sorted:
      print row