I currently have an issue of spanning rows given by a start vs end date range in Pandas, but need to bring the start date as well, not only the difference. I tried this solution link.
Credits to @Shubham Sharma who helped with this and to @Pandinus who asked the question originally.
This works to span the rows of both date differences, I just need to bring the start date on the new column as well, but I have not had so much luck.
Should I continue using this code or would someone propose differently to achieve a good result?
d = df['date_end'].sub(df['date_start']).dt.days
df1 = df.reindex(df.index.repeat(d))
i = df1.groupby(level=0).cumcount() + 1
df1['date'] = df1['date_start'] + pd.to_timedelta(i, unit='d')
Grabbing an example from that post, this is the result I need:
Id num color date_start date_end Dailydate
0 aa0 blue 1/1/2022 1/2/2022 1/1/2022
0 aa0 blue 1/1/2022 1/2/2022 1/2/2022
1 aa1 red 1/1/2022 1/4/2022 1/1/2022
1 aa1 red 1/1/2022 1/4/2022 1/2/2022
1 aa1 red 1/1/2022 1/4/2022 1/3/2022
1 aa1 red 1/1/2022 1/4/2022 1/4/2022
2 aa2 yellow 1/7/2022 1/9/2022 1/7/2022
2 aa2 yellow 1/7/2022 1/9/2022 1/8/2022
2 aa2 yellow 1/7/2022 1/9/2022 1/9/2022
3 aa3 green 1/12/2022 1/14/2022 1/12/2022
3 aa3 green 1/12/2022 1/14/2022 1/13/2022
3 aa3 green 1/12/2022 1/14/2022 1/14/2022
I have tried playing with cumcount function and even adding the column first, then readding it with a +1 , but it gives me datime issues. And I'm also thinking it's not the best approach.
Any ideas? Thanks.
IIUC, you can use apply
on axis=1
to create date ranges, then explode
them:
# input from other post
data = {
"id": ["aa0", "aa1", "aa2", "aa3"],
"number": [1, 2, 2, 1],
"color": ["blue", "red", "yellow", "green"],
"date_start": [
date(2022, 1, 1),
date(2022, 1, 1),
date(2022, 1, 7),
date(2022, 1, 12),
],
"date_end": [
date(2022, 1, 2),
date(2022, 1, 4),
date(2022, 1, 9),
date(2022, 1, 14),
],
}
df = pd.DataFrame(data)
# solution
df["Dailydate"] = df.apply(
lambda row: pd.date_range(row["date_start"], row["date_end"]), axis=1
)
df = df.explode("Dailydate").reset_index(names="Id")
Id id number color date_start date_end Dailydate
0 0 aa0 1 blue 2022-01-01 2022-01-02 2022-01-01
1 0 aa0 1 blue 2022-01-01 2022-01-02 2022-01-02
2 1 aa1 2 red 2022-01-01 2022-01-04 2022-01-01
3 1 aa1 2 red 2022-01-01 2022-01-04 2022-01-02
4 1 aa1 2 red 2022-01-01 2022-01-04 2022-01-03
5 1 aa1 2 red 2022-01-01 2022-01-04 2022-01-04
6 2 aa2 2 yellow 2022-01-07 2022-01-09 2022-01-07
7 2 aa2 2 yellow 2022-01-07 2022-01-09 2022-01-08
8 2 aa2 2 yellow 2022-01-07 2022-01-09 2022-01-09
9 3 aa3 1 green 2022-01-12 2022-01-14 2022-01-12
10 3 aa3 1 green 2022-01-12 2022-01-14 2022-01-13
11 3 aa3 1 green 2022-01-12 2022-01-14 2022-01-14