pythonpython-typing

Type-hinting a structure-preserving nested map in Python


I'm writing a Python function inspired by map that takes a function f: Callable[[A], B] and an arbitrarily nested structure of iterables (e.g. lists of tuples of lists of ... of objects of type A) and returns an identically nested structure with f being applied to all "leaf nodes" of the original structure. Ideally, it would support dict[Any, A] too and "homogenous" NamedTuples by applying f only to the values and leaving the keys untouched (NamedTuples' fields should all have type A for this to work).

For example:

>>> f = lambda x: x * 2
>>> input = [([1, 2], [3, 4]), {"foo": 5, "bar": 6}]
>>> nested_map(f, input)
[([2, 4], [6, 8]), {"foo": 10, "bar": 12}]

Ideally, it would be type-hint friendly in the sense that it would tell a static checker that the output object has exactly the same structure as the input. Although this is probably way harder to achieve.

I have a functionally working version for this but:

import inspect
from collections.abc import Iterable
from typing import Any, Callable, Generator, NamedTuple, cast

type MappableIterable[A] = list[A] | tuple[A] | dict[Any, A] | Generator[A]

type NestedIterable[A] = A | MappableIterable["NestedIterable[A]"]

def nested_map[A, B](
        f: Callable[[A], B], nested_structure: NestedIterable[A]
    ) -> NestedIterable[B]:
    """
    Takes a nested iterable structure and applies f to all of its nodes,
    returning them in the same structure.
    """

    structure = type(nested_structure)
    match nested_structure:
        case dict():
            return {
                k: nested_map(f, v)
                for k, v in nested_structure.items()
            }
        # NamedTuple
        case _ if hasattr(structure, "_fields") and isinstance(
                getattr(structure, "_fields"), tuple
            ):
            d = {
                k: nested_map(f, v) for
                k, v in cast(NamedTuple, nested_structure)._asdict().items()
            }
            return cast(NestedIterable[B], structure(**d))
        case _ if inspect.isgenerator(nested_structure):
            return ( nested_map(f, s) for s in nested_structure )
        # Have to check str first since it is an Iterable
        case str():
            return f(nested_structure)
        case Iterable():
            structure = cast(type[list], structure)
            return structure(map(lambda s: nested_map(f, s), nested_structure))
        case _:
            return f(nested_structure)

Do you know whether there is a way to encode the structure's information (or at least; the fact that it is the same one for input and output) in current Python's type-hint system? I have unsuccessfully tried to "parametrize one variable by another" with something like this, but of course it didn't work:

def nested_map[A, B, C: NestedStructure](f: Callable[[A], B], nested_structure: C[A]) -> C[B]:
...

Solution

  • TL;DR: With current Python typing support, this is probably just impossible. Sorry.

    First of all, you definitely can't write a type hint that preserves the exact structure of the input. That would rely on structure-preserving type hints for flat lists, which don't exist (and would run counter to the purpose of a list if they did).

    That means the best you can hope for is an annotation that preserves union types for each nesting level. This is already suboptimal because if one level contains, say, both lists and tuples, every element in that level would have a type of list | tuple, which discards information. The type mappings would look something like this:

    list[A | list[A | list[A]]] -> list[B | list[B | list[B]]]
    list[A | tuple[A, A | list[A]] -> list[B | tuple[B, B | list[B]] 
    

    Unfortunately, even this probably isn't doable. In order for this to work, you'd need to use nested generics like your "parameterize one variable by another" attempt, which I'll reproduce here:

    def nested_map[A, B, C: NestedStructure](f: Callable[[A], B], nested_structure: C[A]) -> C[B]: ...
    

    Nested generics like C[A] are "higher-kinded types", which just doesn't exist in Python. There is an open feature request on GitHub that's still seeing some activity, but it hasn't even been formulated into a PEP yet.

    I don't think there's a way to do this without higher-kinded types:

    1. The original leaf type needs a TypeVar since it defines an abstract relationship between the callback and parameter types.
    2. The new leaf type needs a TypeVar since it defines an abstract relationship between the callback and return types.
    3. The structure of the nested iterable needs a TypeVar since it defines an abstract relationship between the parameter and return types.
    4. These three TypeVars must be separate because none can be inferred from the others.
    5. It must be possible to narrow one abstract type (the iterable) based on another abstract type (the leaves).

    The final requirement in particular basically forces you to use nested generics.