pythonpandasdataframeffill

Python ffill and example


I have below dataframe, and as you notice EffectiveDate is beginning of quarter and sometime values in EffectiveDate are missing. I would like to have a continuous monthly dates in EffectiveDate column with all column filled from most recent EffectiveDate.

Example: For each Group = A, Since values for EffectiveDate = 2/1/2022 & 3/1/2022 are missing so need to fill all the column values from 1/1/2022, and so on..

Input DF:

import pandas as pd
    
data = {
        'Group': ['A'] * 24,
        'EffectiveDate': [
            '1/1/2022', '1/1/2022', '1/1/2022', '1/1/2022', '1/1/2022', '1/1/2022',
            '4/1/2022', '4/1/2022', '4/1/2022', '4/1/2022', '4/1/2022', '4/1/2022',
            '7/1/2022', '7/1/2022', '7/1/2022', '7/1/2022', '7/1/2022', '7/1/2022',
            '10/1/2022', '10/1/2022', '10/1/2022', '10/1/2022', '10/1/2022', '10/1/2022'
        ],
        'ForecastDate': [
            '1/1/2022', '2/1/2022', '3/1/2022', '4/1/2022', '5/1/2022', '6/1/2022',
            '4/1/2022', '5/1/2022', '6/1/2022', '7/1/2022', '8/1/2022', '9/1/2022',
            '7/1/2022', '8/1/2022', '9/1/2022', '10/1/2022', '11/1/2022', '12/1/2022',
            '10/1/2022', '11/1/2022', '12/1/2022', '1/1/2023', '2/1/2023', '3/1/2023'
        ],
        'SKU': ['ABC12'] * 24,
        'Source': ['fdhh'] * 24
    }
    
df = pd.DataFrame(data)

Output DF:


    | Group |EffectiveDate| ForecastDate|   SKU| Source |
    |-------|------------|------------|--------|--------|
    | A     | 1/1/2022   | 1/1/2022   | ABC12  | fdhh   |
    | A     | 1/1/2022   | 2/1/2022   | ABC12  | fdhh   |
    | A     | 1/1/2022   | 3/1/2022   | ABC12  | fdhh   |
    | A     | 1/1/2022   | 4/1/2022   | ABC12  | fdhh   |
    | A     | 1/1/2022   | 5/1/2022   | ABC12  | fdhh   |
    | A     | 1/1/2022   | 6/1/2022   | ABC12  | fdhh   |
    | A     | 2/1/2022   | 2/1/2022   | ABC12  | fdhh   |
    | A     | 2/1/2022   | 3/1/2022   | ABC12  | fdhh   |
    | A     | 2/1/2022   | 4/1/2022   | ABC12  | fdhh   |
    | A     | 2/1/2022   | 5/1/2022   | ABC12  | fdhh   |
    | A     | 2/1/2022   | 6/1/2022   | ABC12  | fdhh   |
    | A     | 3/1/2022   | 3/1/2022   | ABC12  | fdhh   |
    | A     | 3/1/2022   | 4/1/2022   | ABC12  | fdhh   |
    | A     | 3/1/2022   | 5/1/2022   | ABC12  | fdhh   |
    | A     | 3/1/2022   | 6/1/2022   | ABC12  | fdhh   |
    | A     | 4/1/2022   | 4/1/2022   | ABC12  | fdhh   |
    | A     | 4/1/2022   | 5/1/2022   | ABC12  | fdhh   |
    | A     | 4/1/2022   | 6/1/2022   | ABC12  | fdhh   |
    | A     | 4/1/2022   | 7/1/2022   | ABC12  | fdhh   |
    | A     | 4/1/2022   | 8/1/2022   | ABC12  | fdhh   |
    | A     | 4/1/2022   | 9/1/2022   | ABC12  | fdhh   |
    | A     | 5/1/2022   | 5/1/2022   | ABC12  | fdhh   |
    | A     | 5/1/2022   | 6/1/2022   | ABC12  | fdhh   |
    | A     | 5/1/2022   | 7/1/2022   | ABC12  | fdhh   |
    | A     | 5/1/2022   | 8/1/2022   | ABC12  | fdhh   |
    | A     | 5/1/2022   | 9/1/2022   | ABC12  | fdhh   |
    | A     | 6/1/2022   | 6/1/2022   | ABC12  | fdhh   |
    | A     | 6/1/2022   | 7/1/2022   | ABC12  | fdhh   |
    | A     | 6/1/2022   | 8/1/2022   | ABC12  | fdhh   |
    | A     | 6/1/2022   | 9/1/2022   | ABC12  | fdhh   |
    | A     | 7/1/2022   | 7/1/2022   | ABC12  | fdhh   |
    | A     | 7/1/2022   | 8/1/2022   | ABC12  | fdhh   |
    | A     | 7/1/2022   | 9/1/2022   | ABC12  | fdhh   |
    | A     | 7/1/2022   | 10/1/2022  | ABC12  | fdhh   |
    | A     | 7/1/2022   | 11/1/2022  | ABC12  | fdhh   |
    | A     | 7/1/2022   | 12/1/2022  | ABC12  | fdhh   |
    | A     | 8/1/2022   | 8/1/2022   | ABC12  | fdhh   |
    | A     | 8/1/2022   | 9/1/2022   | ABC12  | fdhh   |
    | A     | 8/1/2022   | 10/1/2022  | ABC12  | fdhh   |
    | A     | 8/1/2022   | 11/1/2022  | ABC12  | fdhh   |
    | A     | 8/1/2022   | 12/1/2022  | ABC12  | fdhh   |
    | A     | 9/1/2022   | 9/1/2022   | ABC12  | fdhh   |
    | A     | 9/1/2022   | 10/1/2022  | ABC12  | fdhh   |
    | A     | 9/1/2022   | 11/1/2022  | ABC12  | fdhh   |
    | A     | 9/1/2022   | 12/1/2022  | ABC12  | fdhh   |
    | A     | 10/1/2022  | 10/1/2022  | ABC12  | fdhh   |
    | A     | 10/1/2022  | 11/1/2022  | ABC12  | fdhh   |
    | A     | 10/1/2022  | 12/1/2022  | ABC12  | fdhh   |
    | A     | 10/1/2022  | 1/1/2023   | ABC12  | fdhh   |
    | A     | 10/1/2022  | 2/1/2023   | ABC12  | fdhh   |
    | A     | 10/1/2022  | 3/1/2023   | ABC12  | fdhh   |
    | A     | 11/1/2022  | 11/1/2022  | ABC12  | fdhh   |
    | A     | 11/1/2022  | 12/1/2022  | ABC12  | fdhh   |
    | A     | 11/1/2022  | 1/1/2023   | ABC12  | fdhh   |
    | A     | 11/1/2022  | 2/1/2023   | ABC12  | fdhh   |
    | A     | 11/1/2022  | 3/1/2023   | ABC12  | fdhh   |
    | A     | 12/1/2022  | 12/1/2022  | ABC12  | fdhh   |
    | A     | 12/1/2022  | 1/1/2023   | ABC12  | fdhh   |
    | A     | 12/1/2022  | 2/1/2023   | ABC12  | fdhh   |
    | A     | 12/1/2022  | 3/1/2023   | ABC12  | fdhh   |

Solution

  • You can convert EffectiveDate to month periods by Serie.dt.to_period, rehsape by GroupBy.cumcount and DataFrame.pivot, so possible use DataFrame.reindex per groups. Then reshape back by DataFrame.stack, convert MultiIndex to columns and filter ForecastDate if greater or equal by EffectiveDate:

    df['EffectiveDate'] = pd.to_datetime(df['EffectiveDate']).dt.to_period('m')
    df['ForecastDate'] = pd.to_datetime(df['ForecastDate'])
    
    out = (df.assign(g = df.groupby(['Group','EffectiveDate']).cumcount())
            .pivot(index=['Group','EffectiveDate'], columns='g')
            .reset_index(level=0)
            .groupby('Group')
            .apply(lambda x: x.reindex(pd.period_range(x.index.min(),
                                                       pd.Period(f'{x.index.max().year}-12')), 
                                       method='ffill'))
            .drop('Group', axis=1)
            .stack()
            .droplevel(2)
            .rename_axis(['Group','EffectiveDate'])
            .reset_index()
            .assign(EffectiveDate = lambda x: x['EffectiveDate'].dt.to_timestamp())
            .query('ForecastDate >= EffectiveDate'))
    

    print (out.tail(15))
       Group EffectiveDate ForecastDate    SKU Source
    54     A    2022-10-01   2022-10-01  ABC12   fdhh
    55     A    2022-10-01   2022-11-01  ABC12   fdhh
    56     A    2022-10-01   2022-12-01  ABC12   fdhh
    57     A    2022-10-01   2023-01-01  ABC12   fdhh
    58     A    2022-10-01   2023-02-01  ABC12   fdhh
    59     A    2022-10-01   2023-03-01  ABC12   fdhh
    61     A    2022-11-01   2022-11-01  ABC12   fdhh
    62     A    2022-11-01   2022-12-01  ABC12   fdhh
    63     A    2022-11-01   2023-01-01  ABC12   fdhh
    64     A    2022-11-01   2023-02-01  ABC12   fdhh
    65     A    2022-11-01   2023-03-01  ABC12   fdhh
    68     A    2022-12-01   2022-12-01  ABC12   fdhh
    69     A    2022-12-01   2023-01-01  ABC12   fdhh
    70     A    2022-12-01   2023-02-01  ABC12   fdhh
    71     A    2022-12-01   2023-03-01  ABC12   fdhh
    

    EDIT: For dynamic filtering by Group column use helper dictionary mapping with Series.map in last step for filtering:

    df['EffectiveDate'] = pd.to_datetime(df['EffectiveDate']).dt.to_period('m')
    df['ForecastDate'] = pd.to_datetime(df['ForecastDate'])
    
    mapping = {'A': '2022-10-01',
               'B':'2022-06-01'}
    
    out = (df.assign(g = df.groupby(['Group','EffectiveDate']).cumcount())
            .pivot(index=['Group','EffectiveDate'], columns='g')
            .reset_index(level=0)
            .groupby('Group')
            .apply(lambda x: x.reindex(pd.period_range(x.index.min(),
                                                       pd.Period(f'{x.index.max().year}-12')), 
                                       method='ffill'))
            .drop('Group', axis=1)
            .stack()
            .droplevel(2)
            .rename_axis(['Group','EffectiveDate'])
            .reset_index()
            .assign(EffectiveDate = lambda x: x['EffectiveDate'].dt.to_timestamp())
            .loc[lambda x: (x['ForecastDate']  >= x['EffectiveDate']) &  
                           (x['EffectiveDate'] < x['Group'].map(mapping))]
                )
    

    print(out.tail(10))
       Group EffectiveDate ForecastDate    SKU Source
    41     A    2022-07-01   2022-12-01  ABC12   fdhh
    43     A    2022-08-01   2022-08-01  ABC12   fdhh
    44     A    2022-08-01   2022-09-01  ABC12   fdhh
    45     A    2022-08-01   2022-10-01  ABC12   fdhh
    46     A    2022-08-01   2022-11-01  ABC12   fdhh
    47     A    2022-08-01   2022-12-01  ABC12   fdhh
    50     A    2022-09-01   2022-09-01  ABC12   fdhh
    51     A    2022-09-01   2022-10-01  ABC12   fdhh
    52     A    2022-09-01   2022-11-01  ABC12   fdhh
    53     A    2022-09-01   2022-12-01  ABC12   fdhh