Let's consider the dataframe below -
df = pd.DataFrame({"names": ["foo", "boo", "coo","coo"],"time": [1,4,2,3],"values": [20,10,15,12]})
I want to insert rows for all possible time between 1 and maximum of time column for each name. So the desired dataframe would be -
df = pd.DataFrame({"names": ["foo","boo","boo", "boo","boo","coo","coo","coo"],"time": [1,1,2,3,4,1,2,3],"values": [20,NaN,NaN,NaN,10,NaN,15,12]})
How to do it?
Use custom function in GroupBy.apply
with Series.reindex
by range
:
out = (df.set_index('time')
.groupby('names', sort=False)['values']
.apply(lambda x: x.reindex(range(1, x.index.max()+1)))
.reset_index())
print (out)
names time values
0 foo 1 20.0
1 boo 1 NaN
2 boo 2 NaN
3 boo 3 NaN
4 boo 4 10.0
5 coo 1 NaN
6 coo 2 15.0
7 coo 3 12.0