I need to copy data as a forward fill from Jan 2023 to Feb 2023 and March 2023 for SKU1, Location1 since data submissions are missing. For this: 1. need to create a continuous months for each sku, location combination 2. ffill for NA
expected result is shown in below image, the blue highlights are carried forward ensuring there is continuous monthly data
Thanks in Advance!
please refer to the below input data frame,
import pandas as pd
# Sample data
data = {
'submissionmonth': ['1/1/2023', '1/1/2023', '1/1/2023', '1/1/2023', '1/1/2023', '1/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '4/1/2023', '10/1/2023', '10/1/2023', '10/1/2023', '10/1/2023', '10/1/2023', '10/1/2023', '10/1/2023', '10/1/2023'],
'sku': ['SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU1', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2', 'SKU2'],
'location': ['Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location1', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2', 'Location2'],
'forecastmonth': ['1/1/2023', '2/1/2023', '3/1/2023', '4/1/2023', '5/1/2023', '6/1/2023', '4/1/2023', '5/1/2023', '6/1/2023', '7/1/2023', '8/1/2023', '9/1/2023', '4/1/2023', '5/1/2023', '6/1/2023', '7/1/2023', '8/1/2023', '9/1/2023', '11/1/2023', '12/1/2023', '1/1/2024', '2/1/2024', '3/1/2024', '4/1/2024', '4/1/2024', '4/1/2024'],
'cost': [100, 100, 100, 100, 100, 100, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]
}
# Create DataFrame
df = pd.DataFrame(data)
# Convert to datetime
df['submissionmonth'] = pd.to_datetime(df['submissionmonth'], format='%m/%d/%Y')
df['forecastmonth'] = pd.to_datetime(df['forecastmonth'], format='%m/%d/%Y')
print(df.shape)
print(df)
i tried grouping and ffill, resulted in shape error
Looks like you need to deduplicate the rows with groupby.cumcount
, then use groupby.resample
:
out = (df
# there are several "forecastmonth" per combination of date/sku/location
# we first need to deduplicate them unambiguously
.assign(n=lambda x: x.groupby(['submissionmonth', 'sku', 'location'])
.cumcount())
.set_index('submissionmonth')
.groupby(['n', 'sku', 'location'], group_keys=False)
.resample('MS').ffill()
.reset_index()[df.columns]
# below is optional
.sort_values(by=['submissionmonth', 'sku', 'location', 'forecastmonth'],
kind='stable', ignore_index=True)
)
Output:
submissionmonth sku location forecastmonth cost
0 2023-01-01 SKU1 Location1 2023-01-01 100
1 2023-01-01 SKU1 Location1 2023-02-01 100
2 2023-01-01 SKU1 Location1 2023-03-01 100
3 2023-01-01 SKU1 Location1 2023-04-01 100
4 2023-01-01 SKU1 Location1 2023-05-01 100
5 2023-01-01 SKU1 Location1 2023-06-01 100
6 2023-02-01 SKU1 Location1 2023-01-01 100
7 2023-02-01 SKU1 Location1 2023-02-01 100
8 2023-02-01 SKU1 Location1 2023-03-01 100
9 2023-02-01 SKU1 Location1 2023-04-01 100
10 2023-02-01 SKU1 Location1 2023-05-01 100
11 2023-02-01 SKU1 Location1 2023-06-01 100
12 2023-03-01 SKU1 Location1 2023-01-01 100
13 2023-03-01 SKU1 Location1 2023-02-01 100
14 2023-03-01 SKU1 Location1 2023-03-01 100
15 2023-03-01 SKU1 Location1 2023-04-01 100
16 2023-03-01 SKU1 Location1 2023-05-01 100
17 2023-03-01 SKU1 Location1 2023-06-01 100
18 2023-04-01 SKU1 Location1 2023-04-01 200
19 2023-04-01 SKU1 Location1 2023-05-01 200
20 2023-04-01 SKU1 Location1 2023-06-01 200
21 2023-04-01 SKU1 Location1 2023-07-01 200
22 2023-04-01 SKU1 Location1 2023-08-01 200
23 2023-04-01 SKU1 Location1 2023-09-01 200
24 2023-04-01 SKU2 Location2 2023-04-01 200
25 2023-04-01 SKU2 Location2 2023-05-01 200
26 2023-04-01 SKU2 Location2 2023-06-01 200
27 2023-04-01 SKU2 Location2 2023-07-01 200
28 2023-04-01 SKU2 Location2 2023-08-01 200
29 2023-04-01 SKU2 Location2 2023-09-01 200
30 2023-05-01 SKU2 Location2 2023-04-01 200
31 2023-05-01 SKU2 Location2 2023-05-01 200
32 2023-05-01 SKU2 Location2 2023-06-01 200
33 2023-05-01 SKU2 Location2 2023-07-01 200
34 2023-05-01 SKU2 Location2 2023-08-01 200
35 2023-05-01 SKU2 Location2 2023-09-01 200
36 2023-06-01 SKU2 Location2 2023-04-01 200
37 2023-06-01 SKU2 Location2 2023-05-01 200
38 2023-06-01 SKU2 Location2 2023-06-01 200
39 2023-06-01 SKU2 Location2 2023-07-01 200
40 2023-06-01 SKU2 Location2 2023-08-01 200
41 2023-06-01 SKU2 Location2 2023-09-01 200
42 2023-07-01 SKU2 Location2 2023-04-01 200
43 2023-07-01 SKU2 Location2 2023-05-01 200
44 2023-07-01 SKU2 Location2 2023-06-01 200
45 2023-07-01 SKU2 Location2 2023-07-01 200
46 2023-07-01 SKU2 Location2 2023-08-01 200
47 2023-07-01 SKU2 Location2 2023-09-01 200
48 2023-08-01 SKU2 Location2 2023-04-01 200
49 2023-08-01 SKU2 Location2 2023-05-01 200
50 2023-08-01 SKU2 Location2 2023-06-01 200
51 2023-08-01 SKU2 Location2 2023-07-01 200
52 2023-08-01 SKU2 Location2 2023-08-01 200
53 2023-08-01 SKU2 Location2 2023-09-01 200
54 2023-09-01 SKU2 Location2 2023-04-01 200
55 2023-09-01 SKU2 Location2 2023-05-01 200
56 2023-09-01 SKU2 Location2 2023-06-01 200
57 2023-09-01 SKU2 Location2 2023-07-01 200
58 2023-09-01 SKU2 Location2 2023-08-01 200
59 2023-09-01 SKU2 Location2 2023-09-01 200
60 2023-10-01 SKU2 Location2 2023-11-01 200
61 2023-10-01 SKU2 Location2 2023-12-01 200
62 2023-10-01 SKU2 Location2 2024-01-01 200
63 2023-10-01 SKU2 Location2 2024-02-01 200
64 2023-10-01 SKU2 Location2 2024-03-01 200
65 2023-10-01 SKU2 Location2 2024-04-01 200
66 2023-10-01 SKU2 Location2 2024-04-01 200
67 2023-10-01 SKU2 Location2 2024-04-01 200