pythonpandasgroup-byrowinsertion

Adding Rows to Pandas Dataframe where for all values below the max value of a Column


Let's consider the dataframe below -

df = pd.DataFrame({"names": ["foo",  "boo", "coo","coo"],"time": [1,4,2,3],"values": [20,10,15,12]})

I want to insert rows for all possible time between 1 and maximum of time column for each name. So the desired dataframe would be -

df = pd.DataFrame({"names": ["foo","boo","boo", "boo","boo","coo","coo","coo"],"time": [1,1,2,3,4,1,2,3],"values": [20,NaN,NaN,NaN,10,NaN,15,12]})

How to do it?


Solution

  • Use custom function in GroupBy.apply with Series.reindex by range:

    out = (df.set_index('time')
             .groupby('names', sort=False)['values']
             .apply(lambda x: x.reindex(range(1, x.index.max()+1)))
             .reset_index())
    
    print (out)
      names  time  values
    0   foo     1    20.0
    1   boo     1     NaN
    2   boo     2     NaN
    3   boo     3     NaN
    4   boo     4    10.0
    5   coo     1     NaN
    6   coo     2    15.0
    7   coo     3    12.0