pythonpandasdatetimegroup-byreindex

Pandas reindex date index by group revisited


This has been asked before and a working solution has been proposed here Pandas reindex dates in Groupby, which worked for me in the past, but it does not work any more.

So, to recap I need to reindex dataframe using date to create 'balanced panel' - not to have missing Date-Value combination in any Group. Here is an example:

import pandas as pd
from datetime import datetime

date1 = datetime.strptime('2023-01-01', '%Y-%m-%d').date()
date2 = datetime.strptime('2023-01-02', '%Y-%m-%d').date()
date3 = datetime.strptime('2023-01-03', '%Y-%m-%d').date()

df = pd.DataFrame({'Date':[date1] * 3 + [date2]  + [date3] * 3,
'Group':['A', 'B', 'C', 'A', 'A', 'B', 'C'],
'Value':[20, 10, 23, 45, 60, 14, 25]})
 
df.set_index('Date', inplace=True)

Desired output is:

df_target = pd.DataFrame({'Date':[date1] * 3 + [date2] * 3  + [date3] * 3,
'Group':['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
'Value':[20, 10, 23, 45, 0, 0, 60, 14, 25]})

df_target.set_index('Date', inplace=True)

Attempted solution (note the assertion):

def reindex_by_date(df, freq):
    dates = pd.date_range(start=df.index.min(), end=df.index.max(), freq=freq)
    idx = pd.Index(dates, name='Dates')
    assert dates.duplicated().sum()==0
    return df.reindex(dates, fill_value=0)

df.groupby('Group').apply(reindex_by_date(df, freq='D'))

# this has also been added: .reset_index(drop=True)

Produces an error:

ValueError: cannot reindex from a duplicate axis

I even checked the flags (here it is True):

df.flags.allows_duplicate_labels

Solution

  • You are calling the function incorrectly in apply (you don't pass the group but rather the whole DataFrame).

    This should be:

    df.groupby('Group').apply(lambda g: reindex_by_date(g, freq='D'))
    

    Or:

    df.groupby('Group').apply(reindex_by_date, freq='D')
    

    Output:

                     Group  Value
    Group                        
    A     2023-01-01     A     20
          2023-01-02     A     45
          2023-01-03     A     60
    B     2023-01-01     B     10
          2023-01-02     0      0
          2023-01-03     B     14
    C     2023-01-01     C     23
          2023-01-02     0      0
          2023-01-03     C     25
    

    Note that you'll have to drop Group and reset_index to avoid the reindexing with 0 in Group as column:

    (df.groupby('Group').apply(reindex_by_date, freq='D')
       .drop(columns='Group').reset_index('Group')
     .rename_axis('Date')
    )
    

    Output:

    
               Group  Value
    Date                   
    2023-01-01     A     20
    2023-01-02     A     45
    2023-01-03     A     60
    2023-01-01     B     10
    2023-01-02     B      0
    2023-01-03     B     14
    2023-01-01     C     23
    2023-01-02     C      0
    2023-01-03     C     25