I have a pandas dataframe and I need to fill NULL values with the last value on a partition. Specifically, for each "id" and "month", I need to create and explode last "value" on subsequent months.
Example of my dataset:
id | month | value |
---|---|---|
1 | 2023-01-01 | London |
1 | 2023-02-01 | Paris |
2 | 2023-01-01 | New York |
3 | 2023-02-01 | Paris |
4 | 2023-03-01 | NULL |
My desidered output (Exploding the values up to April 2023):
id | month | value |
---|---|---|
1 | 2023-01-01 | London |
1 | 2023-02-01 | Paris |
1 | 2023-03-01 | Paris |
1 | 2023-04-01 | Paris |
2 | 2023-01-01 | New York |
2 | 2023-02-01 | New York |
2 | 2023-03-01 | New York |
2 | 2023-04-01 | New York |
3 | 2023-02-01 | Paris |
3 | 2023-03-01 | Paris |
3 | 2023-04-01 | Paris |
4 | 2023-03-01 | NULL |
4 | 2023-04-01 | NULL |
Thank u!
One option using reshaping as a rectangular intermediate with pivot
/stack
:
df['month'] = pd.to_datetime(df['month'])
out = (df
.pivot(index='id', columns='month', values='value')
.reindex(columns=pd.date_range('2023-01-01', '2023-04-01', freq='MS').rename('month'))
.ffill(axis=1)
.stack().reset_index(name='value')
)
Or with a MultiIndex and groupby.ffill
:
out = (df
.set_index(['id', 'month'])
.reindex(pd.MultiIndex.from_product([df['id'].unique(),
pd.date_range('2023-01-01', '2023-04-01', freq='MS')],
names=['id', 'month']
))
.groupby(level=0).ffill().dropna().reset_index()
)
Output:
id month value
0 1 2023-01-01 London
1 1 2023-02-01 Paris
2 1 2023-03-01 Paris
3 1 2023-04-01 Paris
4 2 2023-01-01 New York
5 2 2023-02-01 New York
6 2 2023-03-01 New York
7 2 2023-04-01 New York
8 3 2023-02-01 Paris
9 3 2023-03-01 Paris
10 3 2023-04-01 Paris
You can add a helper column to identify the original NaNs:
out = (df
.set_index(['id', 'month']).assign(flag=True)
.reindex(pd.MultiIndex.from_product([df['id'].unique(),
pd.date_range('2023-01-01', '2023-04-01', freq='MS')],
names=['id', 'month']
))
.groupby(level=0).ffill().loc[lambda d: d.pop('flag').notna()].reset_index()
)
Output:
id month value
0 1 2023-01-01 London
1 1 2023-02-01 Paris
2 1 2023-03-01 Paris
3 1 2023-04-01 Paris
4 2 2023-01-01 New York
5 2 2023-02-01 New York
6 2 2023-03-01 New York
7 2 2023-04-01 New York
8 3 2023-02-01 Paris
9 3 2023-03-01 Paris
10 3 2023-04-01 Paris
11 4 2023-03-01 NaN
12 4 2023-04-01 NaN