I was trying to learn using SciPy and I want to create a custom distribution with the pdf defined as :
f(x) = k * x ^ (k-1) if 0 < x < 1 else 0
Based on the documentation on the scipy website I've defined the custom distribution as:
k = 7
class custom_distro(stats.rv_continuous):
def _pdf(self, x):
return k*math.pow(x,k-1) if (0<=x<=1) else 0
Now the custom_distro.pdf()
and custom_distro.cdf()
functions are working fine, but using the function .rvs()
gives the error:
The function value at x=nan is NaN; solver cannot continue.
I've tried to search for what the issue is but I cannot find any helpful suggestion.
I'd also request that if you have some good resources for the library please do suggest.
Now the custom_distro.pdf() and custom_distro.cdf() functions are working fine
This is surprising, because custom_distro
is a subclass of rv_continuous
, but each SciPy (continuous) distributions is an instance of a subclasses of rv_continuous
. For example norm_gen
is a subclass of rv_continuous
, and scipy.stats.norm
is an instance of norm_gen
. (It's weird, I know.)
Another part of the problem may have been that your distribution has finite support (your _pdf
returned 0 when x
was outside the interval [0, 1]
), but it is better to define the support of the distribution using the appropriate feature of the infrastructure. The infrastructure will take care of ensuring that the pdf
method returns 0 when the argument is outside the support, etc.
Here is code for which the rvs
method works.
from scipy import stats
import math
# note that the class is named with _gen appended
class custom_distro_gen(stats.rv_continuous):
def _pdf(self, x, k):
return k*math.pow(x, k-1)
# I'd recommend using operators, e.g.
# k * x**(k-1)
# Then you can use NumPy arrays
# But if `k` can be 0 or less, you'll need to make
# sure `x` is not of integer type or you could get an
# error (can't evaluate integer to negative integer power)
# consider defining the `_argcheck` method if there
# are restrictions on the parameter `k`
# `custom_distro` should be an instance of the `custom_distro_gen` class
# `a` and `b` define the support of the distribution; the value of the pdf
# is zero outside of the support.
custom_distro = custom_distro_gen(name='custom_distro', a=0, b=1)
custom_distro.rvs(k=2.) # 0.875097877164333
# freeze the distribution with k=2
dist = custom_distro(k=2.)
dist.rvs(size=5) # array([0.89701868, 0.65968075, 0.29795727, 0.56256978, 0.25515445])
There is some additional information about adding new distributions at Adding a New Statistics Distribution. Looking at PRs in which distributions were added will also be helpful, e.g. gh-12694.
I was trying to learn using SciPy...
Creating custom distributions is rather advanced. The process is poorly documented, and there are many issues with the infrastructure, so I would not recommend starting with this. Consider following one of the other tutorials.