pythonscipystatisticsconfidence-intervalreliability

How to calculate one-sided tolerance interval with scipy


I would like to calculate a one sided tolerance bound based on the normal distribution given a data set with known N (sample size), standard deviation, and mean.

If the interval were two sided I would do the following:

conf_int = stats.norm.interval(alpha, loc=mean, scale=sigma)

In my situation, I am bootstrapping samples, but if I weren't I would refer to this post on stackoverflow: Correct way to obtain confidence interval with scipy and use the following: conf_int = stats.norm.interval(0.68, loc=mean, scale=sigma / np.sqrt(len(a)))

How would you do the same thing, but to calculate this as a one sided bound (95% of values are above or below x<--bound)?


Solution

  • I assume that you are interested in computing one-side tolerance bound using the normal distribution (based on the fact you mention the scipy.stats.norm.interval function as the two-sided equivalent of your need).

    Then the good news is that, based on the tolerance interval Wikipedia page:

    One-sided normal tolerance intervals have an exact solution in terms of the sample mean and sample variance based on the noncentral t-distribution.

    (FYI: Unfortunately, this is not the case for the two-sided setting)

    This assertion is based on this paper. Besides paragraph 4.8 (page 23) provides the formulas.

    The bad news is that I do not think there is a ready-to-use scipy function that you can safely tweak and use for your purpose.

    But you can easily calculate it yourself. You can find on Github repositories that contain such a calculator from which you can find inspiration, for example that one from which I built the following illustrative example:

    import numpy as np
    from scipy.stats import norm, nct
    
    # sample size
    n=1000
    
    # Percentile for the TI to estimate
    p=0.9
    # confidence level
    g = 0.95
    
    # a demo sample
    x = np.array([np.random.normal(100) for k in range(n)])
    
    # mean estimate based on the sample
    mu_est = x.mean()
    
    # standard deviation estimated based on the sample
    sigma_est = x.std(ddof=1)
    
    # (100*p)th percentile of the standard normal distribution
    zp = norm.ppf(p)
    
    # gth quantile of a non-central t distribution
    # with n-1 degrees of freedom and non-centrality parameter np.sqrt(n)*zp
    t = nct.ppf(g, df=n-1., nc=np.sqrt(n)*zp)
    
    # k factor from Young et al paper
    k = t / np.sqrt(n)
    
    # One-sided tolerance upper bound
    conf_upper_bound = mu_est + (k*sigma_est)