generalized ufunc on two arrays with one non-matching dimension

I want to write a numpy gufunc python extension using the numpy C-api that takes two matrices of arbitrary dimensionality, takes the mean across one dimension, and then subtracts one result from the other. The simplest case of this is two 1d array with a signature '(i), (j) -> ()' which returns a scalar.

Any additional dimensions can be assumed to be core dimensions and therefore the same. For example, a 2d array signature might be '(n, i), (n, j) -> (n)' where the core dimension is axis=0 and would simply loop the function over that axis. Here is what I have so far:

#include <Python.h>
#include <numpy/arrayobject.h>
#include <numpy/ufuncobject.h>
#include <math.h>

static void mean_diff(char **args,
                      const npy_intp *dimensions,
                      const npy_intp* steps,
                      void* extra) {
    npy_intp i;
    npy_intp n1 = 0, n2 = 0;
    double s1 = 0.0, s2 = 0.0;
    char *in1 = args[0], *in2 = args[1], *out = args[2];
    npy_intp len1 = dimensions[0], len2 = dimensions[1];

    for (i = 0; i < len1; i++) {
        double val = *((double *)in1);
        if (!isnan(val)) {
            s1 += val;
            n1++;
        }
        in1 += steps[0];
    }

    for (i = 0; i < len2; i++) {
        double val = *((double *)in2);
        if (!isnan(val)) {
            s2 += val;
            n2++;
        }
        in2 += steps[1];
    }

    double mean1 = (n1 > 0) ? s1 / n1 : 0.0;
    double mean2 = (n2 > 0) ? s2 / n2 : 0.0;

    *((double *)out) = mean1 - mean2;
}

The problem I have is even a simple set of 2 1d arrays only ever considers the first element of each array. For example:

>>> mean_diff([1.,2.,3.,4.], [2.,7.,29.,3.])
-1.0
>>> np.mean([1.,2.,3.,4.]) - np.mean([2.,7.,29.,3.])
-7.75

I assume this has to do with grabbing the wrong dimensions or something, but I can't figure out what.

Solution

The first thing to know is that your function is expected to compute the result multiple times in each call. The number of computations is stored in dimensions[0]. You specified the gufunc signature (i),(j)->(). When the function is called, dimensions[1] and dimensions[2] will hold i and j of the gufunc signature (i.e. the lengths of the 1-d inputs of the core operation).

For example, in this code snippet,

x = np.array([[0, 1, 2, 3, 4], [0, 0, 0, 1, 1.5]])  # shape (2, 5)
y = np.array([[-1, -3, 7], [2, 2, 2]])              # shape (2, 3)
result = mean_diff(x, y)                            # shape (2,)

your function will be called once, with dimensions[0] set to 2. dimensions[1] and dimensions[2] will be 5 and 3, respectively. Your outer loop will compute the result for the pairs ([0, 1, 2, 3, 4], [-1, -3, 7]) and then for ([0, 0, 0, 1, 1.5], [2, 2, 2]).

args[0] and args[1] will point the first element of the input arrays, and args[2] will point to the first element of the output array. Your code does in1 = args[0], in2 = args[1] and out = args[2], so it must compute the result for the data pointed to by in1 and in2, and store the result in *out. Then it must increment in1, in2 and out to point to the next set of data and do the computation again. So you'll implement an outer loop that performs the computation for each input pair and increments the pointers to move to the next set of inputs and output. The amounts to increment in1, in2 and out are stored in steps[0], steps[1] and steps[2], respectively. (For example, steps[0] is the number of bytes to jump in memory to get from the first row of x to the second row.)

Then, within this outer loop, you do the computation on the current pair of arrays from x and y (pointed to by in1 and in2). In general, you can't assume that the elements of the arrays are contiguous, so when computing the means, you have to increment the pointers to the data by the appropriate step sizes; in this case, those are steps[3] and steps[4].

Here's one way you could implement this. I have commented the variables that I pull out of dimensions and steps to help clarify my terse description above.

static void mean_diff(
    char **args,
    const npy_intp *dimensions,
    const npy_intp *steps,
    void *extra)
{
    char *in1 = args[0];
    char *in2 = args[1];
    char *out = args[2];

    npy_intp nloops = dimensions[0];  // Number of outer loops
    npy_intp len1 = dimensions[1];    // Core dimension i
    npy_intp len2 = dimensions[2];    // Core dimension j

    npy_intp step1 = steps[0];        // Outer loop step size for the first input
    npy_intp step2 = steps[1];        // Outer loop step size for the second input
    npy_intp step_out = steps[2];     // Outer loop step size for the output
    npy_intp innerstep1 = steps[3];   // Step size of elements within the first input
    npy_intp innerstep2 = steps[4];   // Step size of elements within the second input

    for (npy_intp i = 0; i < nloops;
         i++, in1 += step1, in2 += step2, out += step_out) {

        // The core calculation.  in1 and in2 point to the 1-d input arrays,
        // and out points to the output array.

        double s1 = 0.0;
        double s2 = 0.0;
        npy_intp n1 = 0;
        npy_intp n2 = 0;

        for (npy_intp j = 0; j < len1; ++j) {
            double val = *(double *)(in1 + j*innerstep1);
            if (!isnan(val)) {
                s1 += val;
                n1++;
            }
        }

        for (npy_intp j = 0; j < len2; ++j) {
            double val = *(double *)(in2 + j*innerstep2);
            if (!isnan(val)) {
                s2 += val;
                n2++;
            }
        }

        double mean1 = (n1 > 0) ? s1 / n1 : 0.0;
        double mean2 = (n2 > 0) ? s2 / n2 : 0.0;

        *((double *)out) = mean1 - mean2;
    }
}

By the way, don't be surprised if a gufunc implemented like this is actually slower than computing the means with the mean method and subtracting the results. NumPy is continually improving its performance, and with the recent SIMD implementations, naive implementations of ufuncs and gufuncs can be slower than the composition of a few calls of optimized NumPy methods.