pythonserializationisinstancestructural-pattern-matching

Class pattern is matching the wrong cases


I'm writing an object serializer but am having issues where the class patterns are not matching the expected cases:

def dump_obj(x):
    match(x):
        case list():
            emit('L')
            dump_obj(len(x))
            for elem in x:
                dump_obj(elem)
        case Iterable():
            emit('I')
            dump_obj((type(x), list(x)))
        case tuple():
            emit('T')
            dump_obj(list(x))
        case str():
            emit('S')
            dump_obj(len(x))
            emit(x)
        case int():
            emit('D')
            emit(str(x))
        case _:
            raise TypeError(f'Unknown obj {x!r}')

When I call dump_obj() with a tuple, it giving an infinite recursion on the I-case for iterables rather than matching the T-case for tuples.

When I call dump_obj() with a list subclass, it is matching the L-case for lists instead of the intended I-case for iterables.


Solution

  • First problem: Ordering

    The cases are not independent of one another. They are tested from the top-down (like a long if/elif chain) and the first to match wins.

    In the example, the specific match tests like like list, tuple, and str need to come before more general matches like Iterable. Otherwise with the current code, a tuple input like (10, 20, 30) will match the I-case instead of the intended T-case.

    Second problem: Specificity

    A class pattern performs an isinstance() check which would match both a type and subclasses of the type. To restrict the case to an exact match, use a type guard:

    case list() if type(x) == list:
        ...
    

    Putting it all together

    With both solutions applied, here is the new code:

    def dump_obj(x):
        match(x):
            case list() if type(x) == list:   # <-- Added guard
                emit('L')
                dump_obj(len(x))
                for elem in x:
                    dump_obj(elem)
            case tuple() if type(x) == tuple: # <-- Added guard
                emit('T')
                dump_obj(list(x))
            case str() if type(x) == str:     # <-- Added guard
                emit('S')
                dump_obj(len(x))
                emit(x)
            case Iterable():                  # <-- Move after list, tuple, str
                emit('I')
                dump_obj((type(x).__name__, list(x)))
            case int():
                emit('D')
                emit(str(x))
            case _:
                raise TypeError(f'Unknown obj {x!r}')
    

    Sample runs

    Here we show that the two problematic cases work as expected.

    >>> dump_obj((10, 20))     # Tuple of integers
    T
    L
    D
    2
    D
    10
    D
    20
    
    >>> class List(list):
    ...     pass
    ...
    >>> dump_obj(List((30, 40)))   # List subclass
    I
    T
    L
    D
    2
    S
    D
    4
    List
    L
    D
    2
    D
    30
    D
    40