pythonstringnumpyduck-typingnumpy-ufunc

Numpy duck array with string dtype unexpectedly throws `numpy.core._exceptions._UFuncNoLoopError`


Here is a minimal working example of a simple numpy duck array that I've been using for numeric data.

import numpy as np

class DuckArray(np.lib.mixins.NDArrayOperatorsMixin):

    def __init__(self, array: np.ndarray):
        self.array = array

    def __repr__(self):
        return f'DuckArray({self.array})'

    def __array_ufunc__(self, function, method, *inputs, **kwargs):

        # Normalize inputs
        inputs = [inp.array if isinstance(inp, type(self)) else inp for inp in inputs]

        # Loop through inputs until we find a valid implementation
        for inp in inputs:
            result = inp.__array_ufunc__(function, method, *inputs, **kwargs)
            if result is not NotImplemented:
                return type(self)(result)

            return NotImplemented

The real version of this class has an implementation of __array_function__ as well, but this question only involves __array_ufunc__.

As we can see, this implementation works for numeric dtypes.

In [1]: a = DuckArray(np.array([1, 2, 3]))
In [2]: a + 2
Out[2]: DuckArray([3 4 5])
In [3]: a == 2
Out[3]: DuckArray([False  True False])

But it fails with a numpy.core._exceptions._UFuncNoLoopError if the array is a string dtype

In [4]: b = DuckArray(np.array(['abc', 'def', 'ghi']))
In [5]: b == 'def'
Traceback (most recent call last):
  File "C:\Users\byrdie\AppData\Local\Programs\Python\Python38\lib\site-packages\IPython\core\interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-c5975227701e>", line 1, in <module>
    b == 'def'
  File "C:\Users\byrdie\AppData\Local\Programs\Python\Python38\lib\site-packages\numpy\lib\mixins.py", line 21, in func
    return ufunc(self, other)
  File "<ipython-input-2-aced4bbdd318>", line 15, in __array_ufunc__
    result = inp.__array_ufunc__(function, method, *inputs, **kwargs)
numpy.core._exceptions._UFuncNoLoopError: ufunc 'equal' did not contain a loop with signature matching types (dtype('<U3'), dtype('<U3')) -> dtype('bool')

Even though the same operation obviously works on the raw array.

In [6]: b.array == 'def'
Out[6]: array([False,  True, False])

Which tells me the ufunc loop does exist, but obviously something is going awry.

Does anyone know where I am going wrong?


Solution

  • When you create a numpy string array, each string's dtype defaults to <Un where n is its length

    np.array(['abc', 'defg'])[0].dtype
    >> dtype('<U3')
    np.array(['abc', 'defg'])[1].dtype
    >> dtype('<U4')
    

    np.equal ufunc has no support for comparing <Un dtypes so you get an error using it to compare two <U3 of 'abc' and 'def'.

    To fix it, explicitly state dtype as object when creating the string array.

    DuckArray(np.array(['abc', 'def'], dtype=object)) == 'abc'
    >> DuckArray([ True False])