pandasdataframe

Take min and max dates for a sequence along a column


I have the following dataframe:

timeseries = pd.date_range("2018-01-01", periods=10, freq="m")
df = pd.DataFrame(data = ["a","a","b","a","a","c","c","c","a","a"], index = timeseries, columns = ['var'])
print(df)
           var
2018-01-31  a
2018-02-28  a
2018-03-31  b
2018-04-30  a
2018-05-31  a
2018-06-30  c
2018-07-31  c
2018-08-31  c
2018-09-30  a
2018-10-31  a

I want to extract the minimum and maximum dates for each uninterrupted sequence of values in column "var" and assign them as additional columns. Desired result is:

           var   min Date   max Date
1/31/2018   a   1/31/2018   2/28/2018
2/28/2018   a   1/31/2018   2/28/2018
3/31/2018   b   3/31/2018   3/31/2018
4/30/2018   a   4/30/2018   5/31/2018
5/31/2018   a   4/30/2018   5/31/2018
6/30/2018   c   6/30/2018   8/31/2018
7/31/2018   c   6/30/2018   8/31/2018
8/31/2018   c   6/30/2018   8/31/2018
9/30/2018   a   9/30/2018   10/31/2018
10/31/2018  a   9/30/2018   10/31/2018

for example in column "var" the first time value "a" appears is on 1/31/2018 and then it goes again "a" on 2/28/2018 before it is interrupted by "b". So for min and max dates we will have 1/31/2018 and 2/28/2018.

I thought I can achieve that with groupby operation but didn't manage since groupby aggregates all "a" or "b" or "c" etc.


Solution

  • You need to convert your index to_series, then form a custom grouper (with shift+cumsum) and use a groupby.transform:

    # group successive values
    group = df['var'].ne(df['var'].shift()).cumsum()
    # form grouper and get transform
    t = df.index.to_series().groupby(group).transform
    # compute min/max
    df['min Date'] = t('min')
    df['max Date'] = t('max')
    

    Output:

               var   min Date   max Date
    2018-01-31   a 2018-01-31 2018-02-28
    2018-02-28   a 2018-01-31 2018-02-28
    2018-03-31   b 2018-03-31 2018-03-31
    2018-04-30   a 2018-04-30 2018-05-31
    2018-05-31   a 2018-04-30 2018-05-31
    2018-06-30   c 2018-06-30 2018-08-31
    2018-07-31   c 2018-06-30 2018-08-31
    2018-08-31   c 2018-06-30 2018-08-31
    2018-09-30   a 2018-09-30 2018-10-31
    2018-10-31   a 2018-09-30 2018-10-31