Fill rows based on start vs end date in Pandas, including start date

I currently have an issue of spanning rows given by a start vs end date range in Pandas, but need to bring the start date as well, not only the difference. I tried this solution link.

Credits to @Shubham Sharma who helped with this and to @Pandinus who asked the question originally.

This works to span the rows of both date differences, I just need to bring the start date on the new column as well, but I have not had so much luck.

Should I continue using this code or would someone propose differently to achieve a good result?

d = df['date_end'].sub(df['date_start']).dt.days
df1 = df.reindex(df.index.repeat(d))
i = df1.groupby(level=0).cumcount() + 1

df1['date'] = df1['date_start'] + pd.to_timedelta(i, unit='d')

Grabbing an example from that post, this is the result I need:

Id  num color   date_start  date_end    Dailydate
0   aa0 blue    1/1/2022    1/2/2022    1/1/2022
0   aa0 blue    1/1/2022    1/2/2022    1/2/2022
1   aa1 red     1/1/2022    1/4/2022    1/1/2022
1   aa1 red     1/1/2022    1/4/2022    1/2/2022
1   aa1 red     1/1/2022    1/4/2022    1/3/2022
1   aa1 red     1/1/2022    1/4/2022    1/4/2022
2   aa2 yellow  1/7/2022    1/9/2022    1/7/2022
2   aa2 yellow  1/7/2022    1/9/2022    1/8/2022
2   aa2 yellow  1/7/2022    1/9/2022    1/9/2022
3   aa3 green   1/12/2022   1/14/2022   1/12/2022
3   aa3 green   1/12/2022   1/14/2022   1/13/2022
3   aa3 green   1/12/2022   1/14/2022   1/14/2022

I have tried playing with cumcount function and even adding the column first, then readding it with a +1 , but it gives me datime issues. And I'm also thinking it's not the best approach.

Any ideas? Thanks.

Solution

IIUC, you can use apply on axis=1 to create date ranges, then explode them:

# input from other post
data = {
    "id": ["aa0", "aa1", "aa2", "aa3"],
    "number": [1, 2, 2, 1],
    "color": ["blue", "red", "yellow", "green"],
    "date_start": [
        date(2022, 1, 1),
        date(2022, 1, 1),
        date(2022, 1, 7),
        date(2022, 1, 12),
    ],
    "date_end": [
        date(2022, 1, 2),
        date(2022, 1, 4),
        date(2022, 1, 9),
        date(2022, 1, 14),
    ],
}

df = pd.DataFrame(data)

# solution
df["Dailydate"] = df.apply(
    lambda row: pd.date_range(row["date_start"], row["date_end"]), axis=1
)
df = df.explode("Dailydate").reset_index(names="Id")

    Id   id  number   color date_start   date_end  Dailydate
0    0  aa0       1    blue 2022-01-01 2022-01-02 2022-01-01
1    0  aa0       1    blue 2022-01-01 2022-01-02 2022-01-02
2    1  aa1       2     red 2022-01-01 2022-01-04 2022-01-01
3    1  aa1       2     red 2022-01-01 2022-01-04 2022-01-02
4    1  aa1       2     red 2022-01-01 2022-01-04 2022-01-03
5    1  aa1       2     red 2022-01-01 2022-01-04 2022-01-04
6    2  aa2       2  yellow 2022-01-07 2022-01-09 2022-01-07
7    2  aa2       2  yellow 2022-01-07 2022-01-09 2022-01-08
8    2  aa2       2  yellow 2022-01-07 2022-01-09 2022-01-09
9    3  aa3       1   green 2022-01-12 2022-01-14 2022-01-12
10   3  aa3       1   green 2022-01-12 2022-01-14 2022-01-13
11   3  aa3       1   green 2022-01-12 2022-01-14 2022-01-14