Initialize a TypedDict and fill keys & values later

I have a dict with of which the types of the keys and values are fixed. I want to define the types in a TypedDict as follows:

class MyTable(TypedDict):
    caption: List[str]
    header: List[str]
    table: pd.DataFrame
    epilogue: List[str]

I have function that returns a MyTable. I want to define first an empty (Typed)dict and fill in keys and values.

def returnsMyTable():
    result = {}
    result['caption'] = ['caption line 1','caption line 2']
    result['header'] = ['header line 1','header line 2']
    result['table'] = pd.DataFrame()
    result['epilogue'] = ['epilogue line 1','epilogue line 2']
    return result

Here MyPy complains that a type annotation for result is needed. I tried the this:

result: MyTable = {}

but then MyPy complains that the keys are missing. Similarly, if I define the keys but set the values to None, it complains about incorrect types of the values.

Is it at all possible to initialize a TypedDict as an empty Dict first and fill in the keys and values later? The docs seem to suggest it is.

I guess I could first define the values as variables and assemble the MyTable later but I'm dealing with legacy code that I'm adding type hinting to. So I'd like to minimize the work.

Solution

What you might want here is to set totality, but I'd think twice about using it.

Quoting the PEP

By default, all keys must be present in a TypedDict. It is possible to override this by specifying totality. Here is how to do this using the class-based syntax:

class MyTable(TypedDict, total=False):
    caption: List[str]
    header: List[str]
    table: pd.DataFrame
    epilogue: List[str]

result: MyTable = {}
result2: MyTable = {"caption": ["One", "Two", "Three"]}

As I said, think twice about that. A total TypedDict gives you a very nice guarantee that all of the items will exist. That is, because MyPy won't allow result to exist without "caption", you can safely call cap = result["caption"].

If you set total=False, that guarantee goes away. On the assumption that you're using your MyTable much more commonly than you're making it, getting the additional safety assurances when you use it is probably a good trade.

Personally, I'd reserve total=False for cases where the creation code sometimes genuinely does leave things out and any code that uses it has to handle that. If it's just a case of taking a few lines to initialise, I'd do it like this:

def returnsMyTable():
    result = {}
    result_caption = ['caption line 1','caption line 2']
    result_header = ['header line 1','header line 2']
    result_table = pd.DataFrame()
    result_epilogue = ['epilogue line 1','epilogue line 2']
    result = {
        "caption": result_caption, 
        "header": result_header, 
        "table": result_table, 
        "epilogue": result_epilogue
    }
    return result