pythonarraysnumpydivisionnumpy-dtype

Why is the dtype shown (even if it's the native one) when using floor division with NumPy?


Normally the dtype is hidden when it's equivalent to the native type:

>>> import numpy as np
>>> np.arange(5)
array([0, 1, 2, 3, 4])
>>> np.arange(5).dtype
dtype('int32')

>>> np.arange(5) + 3
array([3, 4, 5, 6, 7])

But somehow that doesn't apply to floor division or modulo:

>>> np.arange(5) // 3
array([0, 0, 0, 1, 1], dtype=int32)
>>> np.arange(5) % 3
array([0, 1, 2, 0, 1], dtype=int32)

Why is there a difference?

Python 3.5.4, NumPy 1.13.1, Windows 64bit


Solution

  • You actually have multiple distinct 32-bit integer dtypes here. This is probably a bug.

    NumPy has (accidentally?) created multiple distinct signed 32-bit integer types, probably corresponding to C int and long. Both of them display as numpy.int32, but they're actually different objects. At C level, I believe the type objects are PyIntArrType_Type and PyLongArrType_Type, generated here.

    dtype objects have a type attribute corresponding to the type object of scalars of that dtype. It is this type attribute that NumPy inspects when deciding whether to print dtype information in an array's repr:

    _typelessdata = [int_, float_, complex_]
    if issubclass(intc, int):
        _typelessdata.append(intc)
    
    
    if issubclass(longlong, int):
        _typelessdata.append(longlong)
    
    ...
    
    def array_repr(arr, max_line_width=None, precision=None, suppress_small=None):
        ...
        skipdtype = (arr.dtype.type in _typelessdata) and arr.size > 0
    
        if skipdtype:
            return "%s(%s)" % (class_name, lst)
        else:
            ...
            return "%s(%s,%sdtype=%s)" % (class_name, lst, lf, typename)
    

    On numpy.arange(5) and numpy.arange(5) + 3, .dtype.type is numpy.int_; on numpy.arange(5) // 3 or numpy.arange(5) % 3, .dtype.type is the other 32-bit signed integer type.

    As for why + and // have different output dtypes, they use different type resolution routines. Here's the one for //, and here's the one for +. //'s type resolution looks for a ufunc inner loop that takes types the inputs can be safely cast to, while +'s type resolution applies NumPy type promotion to the arguments and picks the loop matching the resulting type.