Here is the dataframe I have:
df = pd.DataFrame([[pd.Timestamp(2017, 1, 1, 12, 32, 0), 2, 3],
[pd.Timestamp(2017, 1, 2, 12, 32, 0), 4, 9]],
columns=['time', 'feature1', 'feature2'])
For every timestamp value found in the df (i.e for every value of the 'time' column), I need to append 5 more rows with the time column value of each row incremented by a minute successively, and the remaining columns values however will be copied as is.
So the output would look like:
time feature1 feature2
2017-01-01 12:32:00 2 3
2017-01-01 12:33:00 2 3
2017-01-01 12:34:00 2 3
2017-01-01 12:35:00 2 3
2017-01-01 12:36:00 2 3
2017-01-01 12:37:00 2 3
2017-01-02 12:32:00 4 9
2017-01-02 12:33:00 4 9
2017-01-02 12:34:00 4 9
2017-01-02 12:35:00 4 9
2017-01-02 12:36:00 4 9
2017-01-02 12:37:00 4 9
As an elegant solution, I used df.asfreq('1min')
function. But I could not tell it to stop after appending 5 rows! Instead it would keep appending rows with 1 min increments till it reached the next timestamp!
I tried the good old for loop in python and as expected it is very time consuming (I am dealing with 10 million rows).
I was hoping that there would be an elegant solution to this? Something that used functions like - df.asfreq('1min')
but with a stop condition after appending 5 rows.
You can repeat the df and then do a groupby with cumcount and add the minutes like below:
out = df.loc[df.index.repeat(6)]
out['time'] = out['time'] + pd.to_timedelta(out.groupby("time").cumcount(),unit='m')
print(out)
time feature1 feature2
0 2017-01-01 12:32:00 2 3
1 2017-01-01 12:33:00 2 3
2 2017-01-01 12:34:00 2 3
3 2017-01-01 12:35:00 2 3
4 2017-01-01 12:36:00 2 3
5 2017-01-01 12:37:00 2 3
6 2017-01-02 12:32:00 4 9
7 2017-01-02 12:33:00 4 9
8 2017-01-02 12:34:00 4 9
9 2017-01-02 12:35:00 4 9
10 2017-01-02 12:36:00 4 9
11 2017-01-02 12:37:00 4 9