pythonnumpymypypython-typing

Type hinting numpy arrays and batches


I'm trying to create a few array types for a scientific python project. So far, I have created generic types for 1D, 2D and ND numpy arrays:

from typing import Any, Generic, Protocol, Tuple, TypeVar

import numpy as np
from numpy.typing import _DType, _GenericAlias

Vector = _GenericAlias(np.ndarray, (Tuple[int], _DType))
Matrix = _GenericAlias(np.ndarray, (Tuple[int, int], _DType))
Tensor = _GenericAlias(np.ndarray, (Tuple[int, ...], _DType))

The first issue is that mypy says that Vector, Matrix and Tensor are not valid types (e.g. when I try myvar: Vector[int] = np.array([1, 2, 3]))

The second issue is that I'd like to create a generic type Batch that I'd like to use like so: Batch[Vector[complex]] should be like Matrix[complex], Batch[Matrix[float]] should be like Tensor[float] and Batch[Tensor[int] should be like Tensor[int]. I am not sure what I mean by "should be like" I guess I mean that mypy should not complain.

How to I get about this?


Solution

  • You should not be using protected members (names starting with an underscore) from the outside. They are typically marked this way to indicated implementation details that may change in the future, which is exactly what happened here between versions of numpy. For example in 1.24 your import line from numpy.typing fails at runtime because the members you try to import are no longer there.


    There is no need to use internal alias constructors because numpy.ndarray is already generic in terms of the array shape and its dtype. You can construct your own type aliases fairly easily. You just need to ensure you parameterize the dtype correctly. Here is a working example:

    from typing import Tuple, TypeVar
    
    import numpy as np
    
    
    T = TypeVar("T", bound=np.generic, covariant=True)
    
    Vector = np.ndarray[Tuple[int], np.dtype[T]]
    Matrix = np.ndarray[Tuple[int, int], np.dtype[T]]
    Tensor = np.ndarray[Tuple[int, ...], np.dtype[T]]
    

    Usage:

    def f(v: Vector[np.complex64]) -> None:
        print(v[0])
    
    
    def g(m: Matrix[np.float_]) -> None:
        print(m[0])
    
    
    def h(t: Tensor[np.int32]) -> None:
        print(t.reshape((1, 4)))
    
    
    f(np.array([0j+1]))  # prints (1+0j)
    g(np.array([[3.14, 0.], [1., -1.]]))  # prints [3.14 0.  ]
    h(np.array([[3.14, 0.], [1., -1.]]))  # prints [[ 3.14  0.    1.   -1.  ]]
    

    The issue currently is that shapes have almost no typing support, but work is underway to implement that using the new TypeVarTuple capabilities provided by PEP 646. Until then, there is little practical use in discriminating the types by shape.


    The batch issue should be a separate question. Try and ask one question at a time.