pythonpython-typingmypy

Define a jsonable type using mypy / PEP-526


Values that can be converted to a JSON string via json.dumps are:

Union[str, int, float, Mapping, Iterable]

Do you have a better suggestion?


Solution

  • Long story short, you have the following options:

    1. If you have zero idea how your JSON is structured and must support arbitrary JSON blobs, you can:
      1. Wait for mypy to support recursive types.
      2. If you can't wait, just use object or Dict[str, object]. It ends up being nearly identical to using recursive types in practice.
      3. If you don't want to constantly have to type-check your code, use Any or Dict[str, Any]. Doing this lets you avoid needing to sprinkle in a bunch of isinstance checks or casts at the expense of type safety.
    2. If you know precisely what your JSON data looks like, you can:
      1. Use a TypedDict
      2. Use a library like Pydantic to deserialize your JSON into an object

    More discussion follows below.

    Case 1: You do not know how your JSON is structured

    Properly typing arbitrary JSON blobs is unfortunately awkward to do with PEP 484 types. This is partly because mypy (currently) lacks recursive types: this means that the best we can do is use types similar to the one you constructed.

    (We can, however, make a few refinements to your type. In particular, json.Dumps(...) actually does not accept arbitrary iterables. A generator is a subtype of Iterable, for example, but json.dumps(...) will refuse to serialize generators. You probably want to use something like Sequence instead.)

    That said, having access to recursive types may not end up helping that much either: in order to use such a type, you would need to start sprinkling in isinstance checks or casts into your code. For example:

    JsonType = Union[None, int, str, bool, List[JsonType], Dict[str, JsonType]]
    
    def load_config() -> JsonType:
        # ...snip...
    
    config = load_config()
    assert isinstance(config, dict)
    
    name = config["name"]
    assert isinstance(name, str)
    

    So if that's the case, do we really need the full precision of recursive types? In most cases, we can just use object or Dict[str, object] instead: the code we write at runtime is going to be nearly the same in either case.

    For example, if we changed the example above to use JsonType = object, we would still end up needing both asserts.

    Alternatively, if you find sprinkling in assert/isinstance checks to be unnecessary for your use case, a third option is to use Any or Dict[str, Any] and have your JSON be dynamically typed.

    It's obviously less precise than the options presented above, but asking mypy to not type check uses of your JSON dict and relying on runtime exceptions instead can sometimes be more ergonomic in practice.

    Case 2: You know how your JSON data will be structured

    If you do not need to support arbitrary JSON blobs and can assume it forms a particular shape, we have a few more options.

    The first option is to use TypedDicts instead. Basically, you construct a type explicitly specifying what a particular JSON blob is expected to look like and use that instead. This is more work to do, but can let you gain more type-safety.

    The main disadvantage of using TypedDicts is that it's basically the equivalent of a giant cast in the end. For example, if you do:

    from typing import TypedDict
    import json
    
    class Config(TypedDict):
        name: str
        env: str
    
    with open("my-config.txt") as f:
        config: Config = json.load(f)
    

    ...how do we know that my-config.txt actually matches this TypedDict?

    Well, we don't, not for certain.

    This can be fine if you have full control over where the JSON is coming from. In this case, it might be fine to not bother validating the incoming data: just having mypy check uses of your dict is good enough.

    But if having runtime validation is important to you, your options are to either implement that validation logic yourself or use a 3rd party library that can do it on your behalf, such as Pydantic:

    from pydantic import BaseModel
    import json
    
    class Config(BaseModel):
        name: str
        env: str
    
    with open("my-config.txt") as f:
        # The constructor will raise an exception at runtime
        # if the input data does not match the schema
        config = Config(**json.load(f))
    

    The main advantage of using these types of libraries is that you get full type safety. You can also use object attribute syntax instead of dict lookups (e.g. do config.name instead of config["name"]), which is arguably more ergonomic.

    The main disadvantage is doing this validation does add some runtime cost, since you're now scanning over the entire JSON blob. This might end up introducing some non-trivial slowdowns to your code if your JSON happens to contain a large quantity of data.

    Converting your data into an object can also sometimes be a bit inconvenient, especially if you plan on converting it back into a dict later on.