pythonscipybeta-distribution

Beta distribution with bounds at [0.1, 0.5]


I'd like to construct a beta distribution where the mu and sigma are 0.28 and 0.003, respectively, and the distribution is bound at [0.1, 0.5].

import numpy as np
import pandas as pd
import plotly.express as px
from scipy import stats

mu = 0.28
stdev = 0.003

lb = 0.1
ub = 0.5

def calc_alpha(x, y):
    res = x * ((x*(1-x) / y**2)-1)
    return res

def calc_beta(x, y):
    res = (1 - x) * ((x*(1-x) / y**2)-1)
    return res

alpha_ = calc_alpha(mu, stdev)
beta_ = calc_beta(mu, stdev)

x = np.linspace(lb, ub, 100000)
y = stats.beta(alpha_, beta_, loc = lb, scale = ub).pdf(x)

fig = px.line(x=x, y=y)
fig.show()

This seems to work. However, as a test, I sample from the same distribution, and I calculate the mean and standard deviation of the sample and get different values than what I started out with.

Also, the min and max of these values isn't the range I want, so pretty sure that I'm not using loc and scale correctly.

beta_rands = stats.beta(alpha_, beta_, loc = lb, scale = ub).rvs(1000000)

# 0.24
beta_rands.mean()

# 0.0014
beta_rands.std()

#0.232
beta_rands.min()

#0.247
beta_rands.max()

Solution

  • You need to understand the linear transformation underlying the loc and scale parameters.

    Let v1 and v2 be random variables corresponding to beta(a=a, b=b) and beta(a=a, b=b, loc=loc, scale=scale) respectively. Then v1 is transformed into v2 = loc + v1 * scale. Since v1 is in [0, 1], v2 is in [loc, loc + scale]. (See notes in the official documentation.)

    You want v2 to be in [0.1, 0.5], so loc and scale should be 0.1 and 0.4 respectively.

    You want the mean and standard deviation of v2 to be 0.28 and 0.003 respectively, so the mean and standard deviation of v1 should be (0.28-0.1)/0.4 and 0.003/0.4 respectively, because v1 = (v2 - 0.1) / 0.4.(A shift does not change the standard deviation. See the properties of the standard deviation.)

    The following example shows the implementation.

    import scipy.stats
    
    def get_ab(mean, var):
        assert(var < mean*(1-mean))
        var = (mean*(1-mean)/var - 1)
        return mean*var, (1-mean)*var
    
    # v2 = 0.1 + v1 * 0.4
    # v1 = (v2 - 0.1) / 0.4
    a, b = get_ab((0.28-0.1)/0.4, (0.003/0.4)**2)
    rv = scipy.stats.beta(a=a, b=b, loc=0.1, scale=0.4)
    print([rv.mean(), rv.std()])
    # This will output '[0.28, 0.003]'.