I have multiple pandas series, and they have some overlapping times:
In [1]: import pandas as pd
In [2]: cycle_00z = pd.Series(data=[10, 10, 10, 10],
index=pd.date_range('2015-01-01 00', '2015-01-01 03', freq='H'))
In [3]: cycle_02z = pd.Series(data=[20, 20, 20, 20],
index=pd.date_range('2015-01-01 02', '2015-01-01 05', freq='H'))
In [4]: cycle_04z = pd.Series(data=[30, 30, 30, 30],
index=pd.date_range('2015-01-01 04', '2015-01-01 07', freq='H'))
In [5]: cycle_00z
Out[5]:
2015-01-01 00:00:00 10
2015-01-01 01:00:00 10
2015-01-01 02:00:00 10
2015-01-01 03:00:00 10
Freq: H, dtype: int64
In [6]: cycle_02z
Out[6]:
2015-01-01 02:00:00 20
2015-01-01 03:00:00 20
2015-01-01 04:00:00 20
2015-01-01 05:00:00 20
Freq: H, dtype: int64
In [7]: cycle_04z
Out[7]:
2015-01-01 04:00:00 30
2015-01-01 05:00:00 30
2015-01-01 06:00:00 30
2015-01-01 07:00:00 30
Freq: H, dtype: int64
I would like to create another pandas series from these three, which will contain the unique times from these three cycles and the most recent data (when the times overlap). In this case it would look like this:
In [8]: continuous = pd.Series(data=[10, 10, 20, 20, 30, 30, 30, 30],
index=pd.date_range('2015-01-01 00', '2015-01-01 07', freq='H'))
In [9]: continuous
Out[21]:
2015-01-01 00:00:00 10
2015-01-01 01:00:00 10
2015-01-01 02:00:00 20
2015-01-01 03:00:00 20
2015-01-01 04:00:00 30
2015-01-01 05:00:00 30
2015-01-01 06:00:00 30
2015-01-01 07:00:00 30
Freq: H, dtype: int64
Just wondering if there would be a neat way to achieve that using pandas please? I will actually need to implement the technique in xray DataArrays but I guess the idea would be the same. Essentially, always keep data from most recent cycles.
Thanks
One way is to use the combine_first
method:
In [39]: cycle_04z.combine_first(cycle_02z).combine_first(cycle_00z)
Out[39]:
2015-01-01 00:00:00 10
2015-01-01 01:00:00 10
2015-01-01 02:00:00 20
2015-01-01 03:00:00 20
2015-01-01 04:00:00 30
2015-01-01 05:00:00 30
2015-01-01 06:00:00 30
2015-01-01 07:00:00 30
Freq: H, dtype: float64
Or, if you are performing the updates in a loop, something like this would work:
In [40]: result = cycle_00z
In [41]: result = cycle_02z.combine_first(result)
In [42]: result = cycle_04z.combine_first(result)
In [43]: result
Out[43]:
2015-01-01 00:00:00 10
2015-01-01 01:00:00 10
2015-01-01 02:00:00 20
2015-01-01 03:00:00 20
2015-01-01 04:00:00 30
2015-01-01 05:00:00 30
2015-01-01 06:00:00 30
2015-01-01 07:00:00 30
Freq: H, dtype: float64