pythonrstatisticsfrequency-analysis

Extreme value analysis and quantile estimation using log Pearson type 3 (Pearson III) distribution - R vs Python


I am trying to estimate quantiles for some snow data using the log pearson type 3 distribution in Python and comparing with R. I do this by reading in the data, log transforming it, fitting Pearson type 3, estimating quantiles, then transforming back from log space.

In python:

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import lmoments3 as lm
import lmoments3.distr as ld

data=np.array([[1079],
       [ 889],
       [ 996],
       [1476],
       [1567],
       [ 897],
       [ 991],
       [1222],
       [1372],
       [1450],
       [1077],
       [1354],
       [1029],
       [1699],
       [ 965],
       [1133],
       [1951],
       [1621],
       [1069],
       [ 930],
       [1039],
       [1839]])

return_periods = np.array([2,3,5,10,20,50,100,200,1000])

log_data = np.log(data)
params = stats.pearson3.fit(log_data) #Max likelihood estimation method
quantiles = np.exp(stats.pearson3.ppf(1 - 1 / return_periods, *params))

paramsmm=ld.pe3.lmom_fit(log_data) #lmoments estimation method
paramsmm2=(paramsmm["skew"], paramsmm['loc'], paramsmm['scale'][0])
quantilesmm = np.exp(ld.pe3.ppf(1 - 1 / return_periods, *paramsmm2))

print(quantiles)
print(quantilesmm)

in R:

library(lmom)
library(lmomco)
library(FAdist)

swe_data <- c(1079,
              889,
              996,
              1476,
              1567,
              897,
              991,
              1222,
              1372,
              1450,
              1077,
              1354,
              1029,
              1699,
              965,
              1133,
              1951,
              1621,
              1069,
              930,
              1039,
              1839)

return_periods <- c(2, 3, 5, 10, 20, 50, 100, 200, 1000)
exceedance_probabilities <- 1 / return_periods  # P = 1/T
nonexceedance_probabilities <- 1 - exceedance_probabilities  # P_nonexceedance = 1 - P_exceedance

log_swe <- log(swe_data)
loglmoments <- lmom.ub(log_swe_data)
fit_LP3 <- parpe3(loglmoments)  #pars estimated using lmoments
LP3_est=exp(quape3(nonexceedance_probabilities, fit_LP3))
print(LP3_est)

The quantiles estimated are the following:

MLE/scipy stats:

params=(2.0246357656236125, 7.10812763271725, 0.32194785836668816) #skew, loc scale 
quantiles=[1105.86050592 1259.46110488 1484.67412496 1857.18767881 2324.18036925 3127.68767927 3916.2007443  4904.15011095 8271.24322709]

Lmoments/python:

params=(-2.2194418726874434, 7.1069179914286424, 0.07535915093401913) #skew, loc scale
quantiles=[1251.30865382 1276.35189073 1291.29995882 1300.06624583 1303.59129662 1305.31725745 1305.78638777 1305.98555852 1306.11275037]

Lmoments/R:

params= (7.1069180 0.2566677 0.9365001) #mu, sigma, gamma
quantiles=[1173.116 1313.849 1485.109 1721.131 1969.817 2326.812 2623.112 2945.728 3814.692]

I would expect the latter two methods, both using lmoments, to produce the same result. Based on comparisons with other distributions, it seems like R is giving the most realistic result. Any explanation for the large differences? How might I get a similar result in Python?


Solution

  • Okey to my understanding basically, The main difference between the Python and R results stems from the estimation methods used: Python's scipy.stats uses Maximum Likelihood Estimation (MLE) by default. R's lmom package uses L-moments estimation. These methods can produce different parameter estimates, especially for small sample sizes or highly skewed distributions like the Log-Pearson Type III.To get a similar or closer result you can use the lmoments3 library in Python, which implements the L-moments method. Ensure correct parameter extraction from the lmoments3 results.

    To get them to agree, use the lmoments3 library in Python – it uses the same L-moments method as R. Just double-check how you're pulling out the parameters, especially the 'scale' value, from the results. Here's how you can get python result much closer to R results.

    import numpy as np
    import scipy.stats as stats
    import lmoments3 as lm
    import lmoments3.distr as ld
    
    # Input data
    data = np.array([1079, 889, 996, 1476, 1567, 897, 991, 1222, 1372, 1450,
                     1077, 1354, 1029, 1699, 965, 1133, 1951, 1621, 1069,
                     930, 1039, 1839])
    
    # Return periods and probabilities
    return_periods = np.array([2, 3, 5, 10, 20, 50, 100, 200, 1000])
    non_exceedance_probabilities = 1 - 1 / return_periods
    
    # Log-transform the data
    log_data = np.log(data)
    
    # L-moments method (using lmoments3)
    paramsmm = ld.pe3.lmom_fit(log_data)
    quantiles_lmom = np.exp(ld.pe3.ppf(non_exceedance_probabilities, 
                                       paramsmm["skew"], 
                                       paramsmm['loc'], 
                                       paramsmm['scale']))
    
    print("L-moments method results:")
    print(quantiles_lmom)
    
    # Maximum Likelihood Estimation method (using scipy.stats)
    params_mle = stats.pearson3.fit(log_data)
    quantiles_mle = np.exp(stats.pearson3.ppf(non_exceedance_probabilities, *params_mle))
    
    print("\nMaximum Likelihood Estimation results:")
    print(quantiles_mle)
    
    # R results for comparison
    r_quantiles = np.array([1173.116, 1313.849, 1485.109, 1721.131, 1969.817, 
                            2326.812, 2623.112, 2945.728, 3814.692])
    
    print("\nR results:")
    print(r_quantiles)
    
    # Compare results
    print("\nDifference between Python L-moments and R results:")
    print(quantiles_lmom - r_quantiles)