I am trying to estimate quantiles for some snow data using the log pearson type 3 distribution in Python and comparing with R. I do this by reading in the data, log transforming it, fitting Pearson type 3, estimating quantiles, then transforming back from log space.
In python:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import lmoments3 as lm
import lmoments3.distr as ld
data=np.array([[1079],
[ 889],
[ 996],
[1476],
[1567],
[ 897],
[ 991],
[1222],
[1372],
[1450],
[1077],
[1354],
[1029],
[1699],
[ 965],
[1133],
[1951],
[1621],
[1069],
[ 930],
[1039],
[1839]])
return_periods = np.array([2,3,5,10,20,50,100,200,1000])
log_data = np.log(data)
params = stats.pearson3.fit(log_data) #Max likelihood estimation method
quantiles = np.exp(stats.pearson3.ppf(1 - 1 / return_periods, *params))
paramsmm=ld.pe3.lmom_fit(log_data) #lmoments estimation method
paramsmm2=(paramsmm["skew"], paramsmm['loc'], paramsmm['scale'][0])
quantilesmm = np.exp(ld.pe3.ppf(1 - 1 / return_periods, *paramsmm2))
print(quantiles)
print(quantilesmm)
in R:
library(lmom)
library(lmomco)
library(FAdist)
swe_data <- c(1079,
889,
996,
1476,
1567,
897,
991,
1222,
1372,
1450,
1077,
1354,
1029,
1699,
965,
1133,
1951,
1621,
1069,
930,
1039,
1839)
return_periods <- c(2, 3, 5, 10, 20, 50, 100, 200, 1000)
exceedance_probabilities <- 1 / return_periods # P = 1/T
nonexceedance_probabilities <- 1 - exceedance_probabilities # P_nonexceedance = 1 - P_exceedance
log_swe <- log(swe_data)
loglmoments <- lmom.ub(log_swe_data)
fit_LP3 <- parpe3(loglmoments) #pars estimated using lmoments
LP3_est=exp(quape3(nonexceedance_probabilities, fit_LP3))
print(LP3_est)
The quantiles estimated are the following:
MLE/scipy stats:
params=(2.0246357656236125, 7.10812763271725, 0.32194785836668816) #skew, loc scale
quantiles=[1105.86050592 1259.46110488 1484.67412496 1857.18767881 2324.18036925 3127.68767927 3916.2007443 4904.15011095 8271.24322709]
Lmoments/python:
params=(-2.2194418726874434, 7.1069179914286424, 0.07535915093401913) #skew, loc scale
quantiles=[1251.30865382 1276.35189073 1291.29995882 1300.06624583 1303.59129662 1305.31725745 1305.78638777 1305.98555852 1306.11275037]
Lmoments/R:
params= (7.1069180 0.2566677 0.9365001) #mu, sigma, gamma
quantiles=[1173.116 1313.849 1485.109 1721.131 1969.817 2326.812 2623.112 2945.728 3814.692]
I would expect the latter two methods, both using lmoments, to produce the same result. Based on comparisons with other distributions, it seems like R is giving the most realistic result. Any explanation for the large differences? How might I get a similar result in Python?
Okey to my understanding basically, The main difference between the Python and R results stems from the estimation methods used: Python's scipy.stats
uses Maximum Likelihood Estimation (MLE) by default.
R's lmom
package uses L-moments
estimation.
These methods can produce different parameter estimates, especially for small sample sizes or highly skewed distributions like the Log-Pearson Type III.To get a similar or closer result you can use the lmoments3
library in Python, which implements the L-moments method. Ensure correct parameter extraction from the lmoments3 results.
To get them to agree, use the lmoments3
library in Python – it uses the same L-moments
method as R. Just double-check how you're pulling out the parameters, especially the 'scale'
value, from the results. Here's how you can get python result much closer to R results.
import numpy as np
import scipy.stats as stats
import lmoments3 as lm
import lmoments3.distr as ld
# Input data
data = np.array([1079, 889, 996, 1476, 1567, 897, 991, 1222, 1372, 1450,
1077, 1354, 1029, 1699, 965, 1133, 1951, 1621, 1069,
930, 1039, 1839])
# Return periods and probabilities
return_periods = np.array([2, 3, 5, 10, 20, 50, 100, 200, 1000])
non_exceedance_probabilities = 1 - 1 / return_periods
# Log-transform the data
log_data = np.log(data)
# L-moments method (using lmoments3)
paramsmm = ld.pe3.lmom_fit(log_data)
quantiles_lmom = np.exp(ld.pe3.ppf(non_exceedance_probabilities,
paramsmm["skew"],
paramsmm['loc'],
paramsmm['scale']))
print("L-moments method results:")
print(quantiles_lmom)
# Maximum Likelihood Estimation method (using scipy.stats)
params_mle = stats.pearson3.fit(log_data)
quantiles_mle = np.exp(stats.pearson3.ppf(non_exceedance_probabilities, *params_mle))
print("\nMaximum Likelihood Estimation results:")
print(quantiles_mle)
# R results for comparison
r_quantiles = np.array([1173.116, 1313.849, 1485.109, 1721.131, 1969.817,
2326.812, 2623.112, 2945.728, 3814.692])
print("\nR results:")
print(r_quantiles)
# Compare results
print("\nDifference between Python L-moments and R results:")
print(quantiles_lmom - r_quantiles)