pandasdataframedata-sciencedata-munging

pandas explode dataframe by values of cell


I from a dataframe:

df = C1 C2 C3 from_time to_time
     a   b c     1         3
     q   t y     4         9

I want to explode it by the value of from_time , to_time, so it will be:

df = C1 C2 C3 time from_time to_time
     a   b c    1      1         3
     a   b c    2      1         3
     a   b c    3      1         3
     q   t y    4      4         9
     q   t y    5      4         9

...

What is the best way to do so?


Solution

  • Use DataFrame.explode with ranges if small DataFrames:

    df.insert(3, 'time', df.apply(lambda x: range(x.from_time, x.to_time + 1), axis=1))
    df = df.explode('time')
    print (df)
      C1 C2 C3 time  from_time  to_time
    0  a  b  c    1          1        3
    0  a  b  c    2          1        3
    0  a  b  c    3          1        3
    1  q  t  y    4          4        9
    1  q  t  y    5          4        9
    1  q  t  y    6          4        9
    1  q  t  y    7          4        9
    1  q  t  y    8          4        9
    1  q  t  y    9          4        9
    

    For better performance use Index.repeat with DataFrame.loc and for new column use GroupBy.cumcount for counter per index values with from_time values:

    df = df.loc[df.index.repeat(df.to_time.sub(df.from_time) + 1)]
    df.insert(3, 'time', df.groupby(level=0).cumcount().add(df['from_time']))
    print (df)
      C1 C2 C3  time  from_time  to_time
    0  a  b  c     1          1        3
    0  a  b  c     2          1        3
    0  a  b  c     3          1        3
    1  q  t  y     4          4        9
    1  q  t  y     5          4        9
    1  q  t  y     6          4        9
    1  q  t  y     7          4        9
    1  q  t  y     8          4        9
    1  q  t  y     9          4        9