pythonpandasnumpystatsmodels

Time series Analysis ,checking for stationarity using Kwiatkowski–Phillips–Schmidt–Shin (KPSS)


I am performing a time series analysis and was checking for stationarity using Kwiatkowski–Phillips–Schmidt–Shin (KPSS). I have loaded the data using the following:

import pandas as pd
import numpy as np
path = 'https://raw.githubusercontent.com/selva86/datasets/master/daily-min-temperatures.csv'
df = pd.read_csv(path, parse_dates=['Date'], index_col='Date')
df.plot(title='Daily Temperatures', figsize=(14,8), legend=None);

This is the code I used but I am unable to display the results.

# define function for kpss test
from statsmodels.tsa.stattools import kpss

# define KPSS
def kpss_test(timeseries):
    print ('Results of KPSS Test:')
    kpsstest = kpss(timeseries, regression='c')
    kpss_output = pd.Series(kpsstest[0:3], index=['Test Statistic','p-value','Lags Used'])
    for key,value in kpsstest[3].items():
      kpss_output['Critical Value (%s)'%key] = value

Solution

  • You are almost there, Just return the kpss_output like so:

    def kpss_test(timeseries):
        print ('Results of KPSS Test:')
        kpsstest = kpss(timeseries, regression='c')
        kpss_output = pd.Series(kpsstest[0:3], index=['Test Statistic','p-value','Lags Used'])
        for key,value in kpsstest[3].items():
          kpss_output['Critical Value (%s)'%key] = value
        
        return kpss_output
    

    when you call kpss_test(df.Temp) you will get:

    Test Statistic            0.06511
    p-value                   0.10000
    Lags Used                30.00000
    Critical Value (10%)      0.34700
    Critical Value (5%)       0.46300
    Critical Value (2.5%)     0.57400
    Critical Value (1%)       0.73900
    dtype: float64