scipyscipy.stats

Missing _shape_info in custom Scipy distributions


I am trying to define a custom distribution in Scipy. Let us assume for simplicity that we are looking at the "affine" distribution, i.e. a mix of uniform and triangular.

from scipy import stats


class affine_distribution_gen(stats.rv_continuous):
  def _argcheck(self, c):
     return 0 <= c <= 2
  
  def _pdf(self, x, c):
     return (2 - 2 * c) * x + c

  def _cdf(self, x, c):
     return x * (c + x + c * x)

I then attempt fitting the parameters to data:

affine = affine_distribution_gen(name='affine', a=0, b=1)
stats.fit(affine, data, {'c': (0, 2)})

The problem is that I get the following exceptions:

AttributeError                            Traceback (most recent call last)

/usr/local/lib/python3.10/dist-packages/scipy/stats/_fit.py in fit(dist, data, bounds, guess, method, optimizer)
    539     try:
--> 540         param_info = dist._param_info()
    541     except AttributeError as e:

2 frames

/usr/local/lib/python3.10/dist-packages/scipy/stats/_distn_infrastructure.py in _param_info(self)
   2921     def _param_info(self):
-> 2922         shape_info = self._shape_info()
   2923         loc_info = _ShapeInfo("loc", False, (-np.inf, np.inf), (False, False))

AttributeError: 'affine_distribution_gen' object has no attribute '_shape_info'


The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)

<ipython-input-11-779300b83040> in <cell line: 2>()
      1 affine = affine_distribution_gen(name='affine', a=0, b=1)
----> 2 stats.fit(affine, data, {'c': (0, 2)})

/usr/local/lib/python3.10/dist-packages/scipy/stats/_fit.py in fit(dist, data, bounds, guess, method, optimizer)
    543                    "`scipy.stats.fit` because shape information has "
    544                    "not been defined.")
--> 545         raise ValueError(message) from e
    546 
    547     # data input validation

ValueError: Distribution `affine` is not yet supported by `scipy.stats.fit` because shape information has not been defined.

The documentation says that the shape is somehow inferred from the signatures of _cdf and _pdf, and provides no explanation on how to do this manually. How should I proceed here?


Solution

  • In case you haven't seen it, there are instructions on how to add a new distribution in the scipy manual. Does not seem to be a very refined process.

    Following the errors and looking up existing distributions gets you this:

    import numpy as np
    from scipy import stats
    from scipy.stats._distn_infrastructure import (
        _ShapeInfo,
    )
    
    
    class affine_distribution_gen(stats.rv_continuous):
        def _argcheck(self, c):
            return 0 <= c <= 2
    
        def _pdf(self, x, c):
            return (2 - 2 * c) * x + c
    
        def _cdf(self, x, c):
            return x * (c + x + c * x)
    
        def _shape_info(self):
            return [_ShapeInfo("c", False, (0, 2), (True, True))]
    

    _ShapeInfo arguments are the name of the parameter, whether it's an integer (True or False), domain of the parameter, and whether the lower and upper limit are included or not.