scipyscipy.stats

Getting error "The function value at x=nan is NaN; solver cannot continue." while creating a custom distribution in scipy


I was trying to learn using SciPy and I want to create a custom distribution with the pdf defined as :

f(x) = k * x ^ (k-1) if 0 < x < 1 else 0

Based on the documentation on the scipy website I've defined the custom distribution as:

k = 7

class custom_distro(stats.rv_continuous):
  def _pdf(self, x):
    return k*math.pow(x,k-1) if (0<=x<=1) else 0

Now the custom_distro.pdf() and custom_distro.cdf() functions are working fine, but using the function .rvs() gives the error:

The function value at x=nan is NaN; solver cannot continue.

I've tried to search for what the issue is but I cannot find any helpful suggestion.

I'd also request that if you have some good resources for the library please do suggest.


Solution

  • Now the custom_distro.pdf() and custom_distro.cdf() functions are working fine

    This is surprising, because custom_distro is a subclass of rv_continuous, but each SciPy (continuous) distributions is an instance of a subclasses of rv_continuous. For example norm_gen is a subclass of rv_continuous, and scipy.stats.norm is an instance of norm_gen. (It's weird, I know.)

    Another part of the problem may have been that your distribution has finite support (your _pdf returned 0 when x was outside the interval [0, 1]), but it is better to define the support of the distribution using the appropriate feature of the infrastructure. The infrastructure will take care of ensuring that the pdf method returns 0 when the argument is outside the support, etc.

    Here is code for which the rvs method works.

    from scipy import stats
    import math
    
    # note that the class is named with _gen appended
    class custom_distro_gen(stats.rv_continuous):
      def _pdf(self, x, k):
        return k*math.pow(x, k-1)
        # I'd recommend using operators, e.g.
        # k * x**(k-1)
        # Then you can use NumPy arrays
        # But if `k` can be 0 or less, you'll need to make
        # sure `x` is not of integer type or you could get an
        # error (can't evaluate integer to negative integer power)
    
      # consider defining the `_argcheck` method if there
      # are restrictions on the parameter `k`
    
    # `custom_distro` should be an instance of the `custom_distro_gen` class
    # `a` and `b` define the support of the distribution; the value of the pdf
    # is zero outside of the support. 
    custom_distro = custom_distro_gen(name='custom_distro', a=0, b=1)
    custom_distro.rvs(k=2.)  # 0.875097877164333
    
    # freeze the distribution with k=2
    dist = custom_distro(k=2.)
    dist.rvs(size=5)  # array([0.89701868, 0.65968075, 0.29795727, 0.56256978, 0.25515445])
    

    There is some additional information about adding new distributions at Adding a New Statistics Distribution. Looking at PRs in which distributions were added will also be helpful, e.g. gh-12694.

    I was trying to learn using SciPy...

    Creating custom distributions is rather advanced. The process is poorly documented, and there are many issues with the infrastructure, so I would not recommend starting with this. Consider following one of the other tutorials.