I have below dataframe, and as you notice EffectiveDate is beginning of quarter and sometime values in EffectiveDate are missing. I would like to have a continuous monthly dates in EffectiveDate column with all column filled from most recent EffectiveDate.
Example: For each Group = A, Since values for EffectiveDate = 2/1/2022 & 3/1/2022 are missing so need to fill all the column values from 1/1/2022, and so on..
Input DF:
import pandas as pd
data = {
'Group': ['A'] * 24,
'EffectiveDate': [
'1/1/2022', '1/1/2022', '1/1/2022', '1/1/2022', '1/1/2022', '1/1/2022',
'4/1/2022', '4/1/2022', '4/1/2022', '4/1/2022', '4/1/2022', '4/1/2022',
'7/1/2022', '7/1/2022', '7/1/2022', '7/1/2022', '7/1/2022', '7/1/2022',
'10/1/2022', '10/1/2022', '10/1/2022', '10/1/2022', '10/1/2022', '10/1/2022'
],
'ForecastDate': [
'1/1/2022', '2/1/2022', '3/1/2022', '4/1/2022', '5/1/2022', '6/1/2022',
'4/1/2022', '5/1/2022', '6/1/2022', '7/1/2022', '8/1/2022', '9/1/2022',
'7/1/2022', '8/1/2022', '9/1/2022', '10/1/2022', '11/1/2022', '12/1/2022',
'10/1/2022', '11/1/2022', '12/1/2022', '1/1/2023', '2/1/2023', '3/1/2023'
],
'SKU': ['ABC12'] * 24,
'Source': ['fdhh'] * 24
}
df = pd.DataFrame(data)
Output DF:
| Group |EffectiveDate| ForecastDate| SKU| Source |
|-------|------------|------------|--------|--------|
| A | 1/1/2022 | 1/1/2022 | ABC12 | fdhh |
| A | 1/1/2022 | 2/1/2022 | ABC12 | fdhh |
| A | 1/1/2022 | 3/1/2022 | ABC12 | fdhh |
| A | 1/1/2022 | 4/1/2022 | ABC12 | fdhh |
| A | 1/1/2022 | 5/1/2022 | ABC12 | fdhh |
| A | 1/1/2022 | 6/1/2022 | ABC12 | fdhh |
| A | 2/1/2022 | 2/1/2022 | ABC12 | fdhh |
| A | 2/1/2022 | 3/1/2022 | ABC12 | fdhh |
| A | 2/1/2022 | 4/1/2022 | ABC12 | fdhh |
| A | 2/1/2022 | 5/1/2022 | ABC12 | fdhh |
| A | 2/1/2022 | 6/1/2022 | ABC12 | fdhh |
| A | 3/1/2022 | 3/1/2022 | ABC12 | fdhh |
| A | 3/1/2022 | 4/1/2022 | ABC12 | fdhh |
| A | 3/1/2022 | 5/1/2022 | ABC12 | fdhh |
| A | 3/1/2022 | 6/1/2022 | ABC12 | fdhh |
| A | 4/1/2022 | 4/1/2022 | ABC12 | fdhh |
| A | 4/1/2022 | 5/1/2022 | ABC12 | fdhh |
| A | 4/1/2022 | 6/1/2022 | ABC12 | fdhh |
| A | 4/1/2022 | 7/1/2022 | ABC12 | fdhh |
| A | 4/1/2022 | 8/1/2022 | ABC12 | fdhh |
| A | 4/1/2022 | 9/1/2022 | ABC12 | fdhh |
| A | 5/1/2022 | 5/1/2022 | ABC12 | fdhh |
| A | 5/1/2022 | 6/1/2022 | ABC12 | fdhh |
| A | 5/1/2022 | 7/1/2022 | ABC12 | fdhh |
| A | 5/1/2022 | 8/1/2022 | ABC12 | fdhh |
| A | 5/1/2022 | 9/1/2022 | ABC12 | fdhh |
| A | 6/1/2022 | 6/1/2022 | ABC12 | fdhh |
| A | 6/1/2022 | 7/1/2022 | ABC12 | fdhh |
| A | 6/1/2022 | 8/1/2022 | ABC12 | fdhh |
| A | 6/1/2022 | 9/1/2022 | ABC12 | fdhh |
| A | 7/1/2022 | 7/1/2022 | ABC12 | fdhh |
| A | 7/1/2022 | 8/1/2022 | ABC12 | fdhh |
| A | 7/1/2022 | 9/1/2022 | ABC12 | fdhh |
| A | 7/1/2022 | 10/1/2022 | ABC12 | fdhh |
| A | 7/1/2022 | 11/1/2022 | ABC12 | fdhh |
| A | 7/1/2022 | 12/1/2022 | ABC12 | fdhh |
| A | 8/1/2022 | 8/1/2022 | ABC12 | fdhh |
| A | 8/1/2022 | 9/1/2022 | ABC12 | fdhh |
| A | 8/1/2022 | 10/1/2022 | ABC12 | fdhh |
| A | 8/1/2022 | 11/1/2022 | ABC12 | fdhh |
| A | 8/1/2022 | 12/1/2022 | ABC12 | fdhh |
| A | 9/1/2022 | 9/1/2022 | ABC12 | fdhh |
| A | 9/1/2022 | 10/1/2022 | ABC12 | fdhh |
| A | 9/1/2022 | 11/1/2022 | ABC12 | fdhh |
| A | 9/1/2022 | 12/1/2022 | ABC12 | fdhh |
| A | 10/1/2022 | 10/1/2022 | ABC12 | fdhh |
| A | 10/1/2022 | 11/1/2022 | ABC12 | fdhh |
| A | 10/1/2022 | 12/1/2022 | ABC12 | fdhh |
| A | 10/1/2022 | 1/1/2023 | ABC12 | fdhh |
| A | 10/1/2022 | 2/1/2023 | ABC12 | fdhh |
| A | 10/1/2022 | 3/1/2023 | ABC12 | fdhh |
| A | 11/1/2022 | 11/1/2022 | ABC12 | fdhh |
| A | 11/1/2022 | 12/1/2022 | ABC12 | fdhh |
| A | 11/1/2022 | 1/1/2023 | ABC12 | fdhh |
| A | 11/1/2022 | 2/1/2023 | ABC12 | fdhh |
| A | 11/1/2022 | 3/1/2023 | ABC12 | fdhh |
| A | 12/1/2022 | 12/1/2022 | ABC12 | fdhh |
| A | 12/1/2022 | 1/1/2023 | ABC12 | fdhh |
| A | 12/1/2022 | 2/1/2023 | ABC12 | fdhh |
| A | 12/1/2022 | 3/1/2023 | ABC12 | fdhh |
You can convert EffectiveDate
to month periods by Serie.dt.to_period
, rehsape by GroupBy.cumcount
and DataFrame.pivot
, so possible use DataFrame.reindex
per groups. Then reshape back by DataFrame.stack
, convert MultiIndex
to columns and filter ForecastDate
if greater or equal by EffectiveDate
:
df['EffectiveDate'] = pd.to_datetime(df['EffectiveDate']).dt.to_period('m')
df['ForecastDate'] = pd.to_datetime(df['ForecastDate'])
out = (df.assign(g = df.groupby(['Group','EffectiveDate']).cumcount())
.pivot(index=['Group','EffectiveDate'], columns='g')
.reset_index(level=0)
.groupby('Group')
.apply(lambda x: x.reindex(pd.period_range(x.index.min(),
pd.Period(f'{x.index.max().year}-12')),
method='ffill'))
.drop('Group', axis=1)
.stack()
.droplevel(2)
.rename_axis(['Group','EffectiveDate'])
.reset_index()
.assign(EffectiveDate = lambda x: x['EffectiveDate'].dt.to_timestamp())
.query('ForecastDate >= EffectiveDate'))
print (out.tail(15))
Group EffectiveDate ForecastDate SKU Source
54 A 2022-10-01 2022-10-01 ABC12 fdhh
55 A 2022-10-01 2022-11-01 ABC12 fdhh
56 A 2022-10-01 2022-12-01 ABC12 fdhh
57 A 2022-10-01 2023-01-01 ABC12 fdhh
58 A 2022-10-01 2023-02-01 ABC12 fdhh
59 A 2022-10-01 2023-03-01 ABC12 fdhh
61 A 2022-11-01 2022-11-01 ABC12 fdhh
62 A 2022-11-01 2022-12-01 ABC12 fdhh
63 A 2022-11-01 2023-01-01 ABC12 fdhh
64 A 2022-11-01 2023-02-01 ABC12 fdhh
65 A 2022-11-01 2023-03-01 ABC12 fdhh
68 A 2022-12-01 2022-12-01 ABC12 fdhh
69 A 2022-12-01 2023-01-01 ABC12 fdhh
70 A 2022-12-01 2023-02-01 ABC12 fdhh
71 A 2022-12-01 2023-03-01 ABC12 fdhh
EDIT: For dynamic filtering by Group
column use helper dictionary mapping
with Series.map
in last step for filtering:
df['EffectiveDate'] = pd.to_datetime(df['EffectiveDate']).dt.to_period('m')
df['ForecastDate'] = pd.to_datetime(df['ForecastDate'])
mapping = {'A': '2022-10-01',
'B':'2022-06-01'}
out = (df.assign(g = df.groupby(['Group','EffectiveDate']).cumcount())
.pivot(index=['Group','EffectiveDate'], columns='g')
.reset_index(level=0)
.groupby('Group')
.apply(lambda x: x.reindex(pd.period_range(x.index.min(),
pd.Period(f'{x.index.max().year}-12')),
method='ffill'))
.drop('Group', axis=1)
.stack()
.droplevel(2)
.rename_axis(['Group','EffectiveDate'])
.reset_index()
.assign(EffectiveDate = lambda x: x['EffectiveDate'].dt.to_timestamp())
.loc[lambda x: (x['ForecastDate'] >= x['EffectiveDate']) &
(x['EffectiveDate'] < x['Group'].map(mapping))]
)
print(out.tail(10))
Group EffectiveDate ForecastDate SKU Source
41 A 2022-07-01 2022-12-01 ABC12 fdhh
43 A 2022-08-01 2022-08-01 ABC12 fdhh
44 A 2022-08-01 2022-09-01 ABC12 fdhh
45 A 2022-08-01 2022-10-01 ABC12 fdhh
46 A 2022-08-01 2022-11-01 ABC12 fdhh
47 A 2022-08-01 2022-12-01 ABC12 fdhh
50 A 2022-09-01 2022-09-01 ABC12 fdhh
51 A 2022-09-01 2022-10-01 ABC12 fdhh
52 A 2022-09-01 2022-11-01 ABC12 fdhh
53 A 2022-09-01 2022-12-01 ABC12 fdhh