pythonpandastime-series

Convert Date Ranges to Time Series in Pandas


My raw data looks like the following:

  start_date   end_date  value
0 2016-01-01 2016-01-03      2
1 2016-01-05 2016-01-08      4

The interpretation is that the data takes a value of 2 between 1/1/2016 and 1/3/2016, and it takes a value of 4 between 1/5/2016 and 1/8/2016. I want to transform the raw data to a daily time series like the following:

2016-01-01    2
2016-01-02    2
2016-01-03    2
2016-01-04    0
2016-01-05    4
2016-01-06    4
2016-01-07    4
2016-01-08    4

Note that if a date in the time series doesn't appear between the start_date and end_date in any row of the raw data, it gets a value of 0 in the time series.

I can create the time series by looping through the raw data, but that's slow. Is there a faster way to do it?


Solution

  • You may try this:

    In [120]: df
    Out[120]:
      start_date   end_date  value
    0 2016-01-01 2016-01-03      2
    1 2016-01-05 2016-01-08      4
    
    In [121]: new = pd.DataFrame({'dt': pd.date_range(df.start_date.min(), df.end_date.max())})
    
    In [122]: new
    Out[122]:
              dt
    0 2016-01-01
    1 2016-01-02
    2 2016-01-03
    3 2016-01-04
    4 2016-01-05
    5 2016-01-06
    6 2016-01-07
    7 2016-01-08
    
    In [123]: new = new.merge(df, how='left', left_on='dt', right_on='start_date').fillna(method='pad')
    
    In [124]: new
    Out[124]:
              dt start_date   end_date  value
    0 2016-01-01 2016-01-01 2016-01-03    2.0
    1 2016-01-02 2016-01-01 2016-01-03    2.0
    2 2016-01-03 2016-01-01 2016-01-03    2.0
    3 2016-01-04 2016-01-01 2016-01-03    2.0
    4 2016-01-05 2016-01-05 2016-01-08    4.0
    5 2016-01-06 2016-01-05 2016-01-08    4.0
    6 2016-01-07 2016-01-05 2016-01-08    4.0
    7 2016-01-08 2016-01-05 2016-01-08    4.0
    
    In [125]: new.ix[(new.dt < new.start_date) | (new.dt > new.end_date), 'value'] = 0
    
    In [126]: new[['dt', 'value']]
    Out[126]:
              dt  value
    0 2016-01-01    2.0
    1 2016-01-02    2.0
    2 2016-01-03    2.0
    3 2016-01-04    0.0
    4 2016-01-05    4.0
    5 2016-01-06    4.0
    6 2016-01-07    4.0
    7 2016-01-08    4.0