pythonpandaslinear-interpolation

How to get equally spaced values with Linear Interpolation when using a Limit lower than the amount of rows between values?


I have a wide dataframe with values (seconds) that decrease over time. Every row represents 1 day. It's like a "best time" for every column so the number can only decrease.

There are gaps between value changes because values don't change every day. I'm using this data for an animated bar chart so I need these values to change gradually over a period of time for a smooth animation. I used interpolation to achieve this, but now I'm a bit stuck.

Sample column:

import pandas as pd
import numpy as np

s = pd.Series([np.nan, 4361, np.nan,
               np.nan, np.nan,
               np.nan, 4320])
s

0       NaN
1    4361.0
2       NaN
3       NaN
4       NaN
5       NaN
6    4320.0

Say I want a transition period of max 7 days I would use linear interpolation like this. The range is lower than the limit so the new values are evenly spaced between value 1 and 6. No problem here.

s = s.interpolate(limit_direction='backward', 
                  limit=7,
                  method='linear',
                  limit_area='inside'
                )
s

0       NaN
1    4361.0
2    4352.8
3    4344.6
4    4336.4
5    4328.2
6    4320.0

To illustrate the problem I will change the limit to 2, in reality the gap between values just increases by a lot. So here we have my original values 1 and 6. The transition period consists out of values 4 and 5.

s = s.interpolate(limit_direction='backward', 
                  limit=2,
                  method='linear',
                  limit_area='inside'
                )
s

0       NaN
1    4361.0
2       NaN
3       NaN
4    4336.4
5    4328.2
6    4320.0

To complete the interpolation I populated values 2 and 3 with the closest previous value like so:

s = s.interpolate(limit_direction='forward',
                  method='pad'
                )
s

0       NaN
1    4361.0
2    4361.0
3    4361.0 --> +24.6
4    4336.4 --> +8.2
5    4328.2 --> +8.2
6    4320.0

The problem is that the transition period isn't linear. The change from value 3 -> 4 is much too abrupt. I know this happens because how the linear method works. And the fact that my limit < the change in rows.

Desired outcome:

0       NaN
1    4361.0
2    4361.0
3    4361.0 --> +13.6
4    4347.3 --> +13.6
5    4333.6 --> +13.6
6    4320.0

Ideally I could specify that the gap should be calculated by dividing the difference (41 in this case) by using the limit instead of the number of rows between the 2 values.

I also thought about a 'negative' limit where I would use interpolation to copy 4361 forwards until I was n rows away from 4320. Then I could use regular linear interpolation to populate the remaining gap.

I tried looking up a solution but this looks like a pretty unique case. I'm unfamiliar with pandas so I don't know any tricks to solve this problem. I would be very thankful if someone more knowledgeable could offer me a solution.

Thanks


Solution

  • I think I understand your question now. The limit assumed in this code is 2. I suppose it would be any value you want.

    >>> df['padded'] = df['og'].ffill()
    >>> df['t'] = np.where(
    ...     pd.concat(
    ...         [df['og'].shift(-i).notnull()
    ...          for i in range(0, 2 + 1)],  # inclusive
    ...         axis=1).any(axis=1),
    ...     df['og'], df['padded'])
    
    >>> df['new'] = df['t'].interpolate(limit=2, direction='back')
    >>> df
           og  padded       t          new
    0     NaN     NaN     NaN          NaN
    1  4361.0  4361.0  4361.0  4361.000000
    2     NaN  4361.0  4361.0  4361.000000
    3     NaN  4361.0  4361.0  4361.000000
    4     NaN  4361.0     NaN  4347.333333
    5     NaN  4361.0     NaN  4333.666667
    6  4320.0  4320.0  4320.0  4320.000000
    

    What the code does is that it does the padding all the way through. It then constructs a series of shifted columns to determine whether any of the values in the next 2 rows (again this can be anything) are not null. Those values are then taken from the og column if they are and placed into column t. The interpolation then proceeds on t.

    There is a way to reduce this to a single line. However, I felt that doing so would difficult to parse.