[SOLVED] How to calculate one-sided tolerance interval with scipy

How to calculate one-sided tolerance interval with scipy

I would like to calculate a one sided tolerance bound based on the normal distribution given a data set with known N (sample size), standard deviation, and mean.

If the interval were two sided I would do the following:

conf_int = stats.norm.interval(alpha, loc=mean, scale=sigma)

In my situation, I am bootstrapping samples, but if I weren't I would refer to this post on stackoverflow: Correct way to obtain confidence interval with scipy and use the following: conf_int = stats.norm.interval(0.68, loc=mean, scale=sigma / np.sqrt(len(a)))

How would you do the same thing, but to calculate this as a one sided bound (95% of values are above or below x<--bound)?

Solution

I assume that you are interested in computing one-side tolerance bound using the normal distribution (based on the fact you mention the scipy.stats.norm.interval function as the two-sided equivalent of your need).

Then the good news is that, based on the tolerance interval Wikipedia page:

One-sided normal tolerance intervals have an exact solution in terms of the sample mean and sample variance based on the noncentral t-distribution.

(FYI: Unfortunately, this is not the case for the two-sided setting)

This assertion is based on this paper. Besides paragraph 4.8 (page 23) provides the formulas.

The bad news is that I do not think there is a ready-to-use scipy function that you can safely tweak and use for your purpose.

But you can easily calculate it yourself. You can find on Github repositories that contain such a calculator from which you can find inspiration, for example that one from which I built the following illustrative example:

import numpy as np
from scipy.stats import norm, nct

# sample size
n=1000

# Percentile for the TI to estimate
p=0.9
# confidence level
g = 0.95

# a demo sample
x = np.array([np.random.normal(100) for k in range(n)])

# mean estimate based on the sample
mu_est = x.mean()

# standard deviation estimated based on the sample
sigma_est = x.std(ddof=1)

# (100*p)th percentile of the standard normal distribution
zp = norm.ppf(p)

# gth quantile of a non-central t distribution
# with n-1 degrees of freedom and non-centrality parameter np.sqrt(n)*zp
t = nct.ppf(g, df=n-1., nc=np.sqrt(n)*zp)

# k factor from Young et al paper
k = t / np.sqrt(n)

# One-sided tolerance upper bound
conf_upper_bound = mu_est + (k*sigma_est)