awkward-array

Why np.vectorize doesn't work for awkward arrays?


I have a simple vectorized function (actually my real case is more complicated), as:

@np.vectorize
def f(x):
    if x > 0:
        return 1
    else:
        return -1

Why it is not working with awkward?

x = ak.Array([[], [10], [40, 50, 60]])
f(x)
ValueError: cannot convert to RegularArray because subarray lengths are not regular (in 
compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-44/awkward-cpp/src/cpu-kernels/awkward_ListOffsetArray_toRegularArray.cpp#L22)

I understand the message, but I don't know why the fact that length is not regular should be a problem.

I guess a workaround is to flatten:

y = f(ak.flatten(x))
y = ak.unflatten(y, ak.num(x))

Solution

  • It looks like the function made with @np.vectorize is not a true ufunc in the sense of calling __array_ufunc__ on its arguments if they're not NumPy arrays. Instead, it's trying to cast its arguments as NumPy arrays, which can't be done if they're ragged.

    However, Numba's @nb.vectorize does create a ufunc that obeys this protocol, so I would suggest a one-character change:

    >>> @nb.vectorize
    ... def f(x):
    ...     if x > 0:
    ...         return 1
    ...     else:
    ...         return -1
    ... 
    >>> x = ak.Array([[], [10], [-40, -50, 60]])
    >>> f(x)
    <Array [[], [1], [-1, -1, 1]] type='3 * var * int64'>
    

    The downside is that you need another library, Numba. The upside is that this vectorized function is actually compiled, whereas @np.vectorize is not. Incidentally, it was this issue that @np.vectorize looks like it's going to "vectorize" your function (in the sense of using a compiled or purely numerical implementation) but doesn't that was the original motivation for Numba (video).

    By the way, I was just assuming that this f is an example and you have another function in mind. If you really want the above, you could do

    >>> np.sign(x)
    <Array [[], [1], [-1, -1, 1]] type='3 * var * int64'>
    

    and no new libraries are involved.