pythonpandas

Calculating a pct_change between 3 values in a pandas series, where one of more of these values can be nan


Scenario: I have a pandas series that contains 3 values. These values can vary between nan, 0 and any value above zero. I am trying to get the pct_change among the series whenever possible.

Examples:

[0,nan,50]
[0,0,0]
[0,0,50]
[nan,nan,50]
[nan,nan,0]
[0,0,nan]
[0,nan,0]

What I tried: from other SO questions I was able to come up with methods either trying to ignore the nan or shifting, but these can potentially yield a result with empty values. Ideally, if a result cannot be calculated, I would like to output a 0.

Code tried:

series_test = pd.Series([0,None,50])
series_test.pct_change().where(series_test.notna()) # tested but gives only NaN or inf

series_test.pct_change(fill_method=None)[series_test.shift(2).notnull()].dropna() # tested but gives empty result

Question: What would be the correct way to approach this?

Expected outputs:

[0,nan,50] - 0 (undefined case)
[0,0,0] - 0 (undefined case)
[0,0,50] - 0 (undefined case)
[nan,nan,50] - 0 (undefined case)
[nan,nan,0] - 0 (undefined case)
[0,0,nan] - 0 (undefined case)
[0,nan,0] - 0 (undefined case)
[1,nan,5] - 400%
[0,1,5] - 400%
[1,2,nan] - 100%
[1,1.3,1.8] - 80%

Solution

  • I think you could dropna, then compute the pct_change and only keep the max finite value:

    series_test.dropna().pct_change().loc[np.isfinite].max()
    

    Or maybe:

    s.pct_change().where(np.isfinite, 0).max()
    

    Example output for the second approach:

    [0, nan, 50] - 0.0
    [0, 0, 0] - 0.0
    [0, 0, 50] - 0.0
    [nan, nan, 50] - 0.0
    [nan, nan, 0] - 0.0
    [0, 0, nan] - 0.0
    [0, nan, 0] - 0.0
    [1, nan, 5] - 4.0
    [0, 1, 5] - 4.0
    [0, 1, nan] - 0.0
    

    Edit: given your comment, it looks like you want to use the first and last non-zero values to compute the percentage change.

    In this case, I'd use a custom function:

    def pct_chg(s):
        tmp = s[s>0]
        if len(tmp)>1:
            return (tmp.iloc[-1]-tmp.iloc[0])/tmp.iloc[0]
        return 0
    

    Which should be equivalent to the more verbose:

    (series_test
     .where(s>0).bfill().ffill()
     .iloc[[0, -1]].pct_change().fillna(0).iloc[-1]
    )
    

    Example:

    [0, nan, 50] - 0
    [0, 0, 0] - 0
    [0, 0, 50] - 0
    [nan, nan, 50] - 0
    [nan, nan, 0] - 0
    [0, 0, nan] - 0
    [0, nan, 0] - 0
    [1, nan, 5] - 4.0
    [0, 1, 5] - 4.0
    [0, 1, nan] - 0
    [1, 1.5, 1.6] - 0.6000000000000001