I'd like to construct a beta distribution where the mu and sigma are 0.28 and 0.003, respectively, and the distribution is bound at [0.1, 0.5].
import numpy as np
import pandas as pd
import plotly.express as px
from scipy import stats
mu = 0.28
stdev = 0.003
lb = 0.1
ub = 0.5
def calc_alpha(x, y):
res = x * ((x*(1-x) / y**2)-1)
return res
def calc_beta(x, y):
res = (1 - x) * ((x*(1-x) / y**2)-1)
return res
alpha_ = calc_alpha(mu, stdev)
beta_ = calc_beta(mu, stdev)
x = np.linspace(lb, ub, 100000)
y = stats.beta(alpha_, beta_, loc = lb, scale = ub).pdf(x)
fig = px.line(x=x, y=y)
fig.show()
This seems to work. However, as a test, I sample from the same distribution, and I calculate the mean and standard deviation of the sample and get different values than what I started out with.
Also, the min and max of these values isn't the range I want, so pretty sure that I'm not using loc and scale correctly.
beta_rands = stats.beta(alpha_, beta_, loc = lb, scale = ub).rvs(1000000)
# 0.24
beta_rands.mean()
# 0.0014
beta_rands.std()
#0.232
beta_rands.min()
#0.247
beta_rands.max()
You need to understand the linear transformation underlying the loc
and scale
parameters.
Let v1
and v2
be random variables corresponding to beta(a=a, b=b)
and beta(a=a, b=b, loc=loc, scale=scale)
respectively. Then v1
is transformed into v2 = loc + v1 * scale
. Since v1
is in [0, 1], v2
is in [loc
, loc + scale
]. (See notes in the official documentation.)
You want v2
to be in [0.1, 0.5], so loc
and scale
should be 0.1 and 0.4 respectively.
You want the mean and standard deviation of v2
to be 0.28 and 0.003 respectively, so the mean and standard deviation of v1
should be (0.28-0.1)/0.4 and 0.003/0.4 respectively, because v1 = (v2 - 0.1) / 0.4
.(A shift does not change the standard deviation. See the properties of the standard deviation.)
The following example shows the implementation.
import scipy.stats
def get_ab(mean, var):
assert(var < mean*(1-mean))
var = (mean*(1-mean)/var - 1)
return mean*var, (1-mean)*var
# v2 = 0.1 + v1 * 0.4
# v1 = (v2 - 0.1) / 0.4
a, b = get_ab((0.28-0.1)/0.4, (0.003/0.4)**2)
rv = scipy.stats.beta(a=a, b=b, loc=0.1, scale=0.4)
print([rv.mean(), rv.std()])
# This will output '[0.28, 0.003]'.