python-3.xscipyscipy.statshistogram2d

How to define custom function for scipy's binned_statistic_2d?


The documentation for scipy's binned_statistic_2d function gives an example for a 2D histogram:

from scipy import stats
x = [0.1, 0.1, 0.1, 0.6]
y = [2.1, 2.6, 2.1, 2.1]
binx = [0.0, 0.5, 1.0]
biny = [2.0, 2.5, 3.0]
ret = stats.binned_statistic_2d(x, y, None, 'count', bins=[binx, biny])

Makes sense, but I'm now trying to implement a custom function. The custom function description is given as:

function : a user-defined function which takes a 1D array of values, and outputs a single numerical statistic. This function will be called on the values in each bin. Empty bins will be represented by function([]), or NaN if this returns an error.

I wasn't sure exactly how to implement this, so I thought I'd check my understanding by writing a custom function that reproduces the count option. I tried

def custom_func(values):
    return len(values)
x = [0.1, 0.1, 0.1, 0.6]
y = [2.1, 2.6, 2.1, 2.1]
binx = [0.0, 0.5, 1.0]
biny = [2.0, 2.5, 3.0]
ret = stats.binned_statistic_2d(x, y, None, custom_func, bins=[binx, biny])

but this generates an error like so:

556 # Make sure `values` match `sample`
557 if(statistic != 'count' and Vlen != Dlen):
558     raise AttributeError('The number of `values` elements must match the '
559                          'length of each `sample` dimension.')
561 try:
562     M = len(bins)

AttributeError: The number of `values` elements must match the length of each `sample` dimension.

How is this custom function supposed to be defined?


Solution

  • The reason for this error is that when using a custom statistic function (or any non-count statistic), you have to pass some array or list of arrays to the values parameter (with the number of elements matching the number in x). You can't just leave it as None as in your example, even though it is irrelevant and does not get used when computing counts of data points in each bin.

    So, to match the results, you can just pass the same x object to the values parameter:

    def custom_func(values):
        return len(values)
    
    x = [0.1, 0.1, 0.1, 0.6]
    y = [2.1, 2.6, 2.1, 2.1]
    binx = [0.0, 0.5, 1.0]
    biny = [2.0, 2.5, 3.0]
    
    ret = stats.binned_statistic_2d(x, y, x, custom_func, bins=[binx, biny])
    
    print(ret)
    # BinnedStatistic2dResult(statistic=array([[2., 1.],
    #        [1., 0.]]), x_edge=array([0. , 0.5, 1. ]), y_edge=array([2. , 2.5, 3. ]), binnumber=array([5, 6, 5, 9]))
    

    The result matches that of the count statistic:

    ret = stats.binned_statistic_2d(x, y, None, 'count', bins=[binx, biny])
    
    print(ret)
    # BinnedStatistic2dResult(statistic=array([[2., 1.],
    #        [1., 0.]]), x_edge=array([0. , 0.5, 1. ]), y_edge=array([2. , 2.5, 3. ]), binnumber=array([5, 6, 5, 9]))