pythondictionary-comprehension

Does Python optimize key-value pairs with None values?


Consider a dict with an int key and dict value, where the value features string keys and further dict values.

Now consider these 2 representations of this dict:

>>> a_it_1
{ 1: {"it": None, "ndar": {1:1}, 2: {"it": {2:8}, "ndar": None} }
>>> a_it_2
{ 1: { "ndar": {1:1}, 2: { "it": {2:8}} }

As you can see the inner dict is sparse; not all values appear in all keys.

a_it_2 is excluding key-value pairs that include None values in an attempt to save memory.

This is a simple example, the actual dict contains more than 10,000 outer key-value pairs.

I am doing some debugging and can see that both these use the same amount of memory.

I am using:

round(asizeof.asizeof({...}) / (1024 * 1024), 2)

where the ... represents the dict comprehension I am using to build the 2 dicts displayed above.

The asizeof method is from the asizeof module of Pympler.

I am including my dict comprehensions below.

Retaining the inner sparse key-value pairs but setting None values:

a_it_1 = {k: {p: vi if type(vi) == int or type(vi) == bool or len(vi) > 0 else None for p, vi in v.items() if vi is not None} for k, v in a_it.items() if k < 10000}

Removing the inner sparse key-value pairs completely:

a_it_2 = {k: {p: vi for p, vi in v.items() if vi is not None and (type(vi) == int or type(vi) == bool or len(vi) > 0)} for k, v in a_it.items() if k < 10000}

Taking the size of both these dicts using asizeof returns the same value: 12.3 (megabytes) each.

I would have expected the lack of the key-value pairs in the first place, to use less memory.

Does Python internally optimize key-value pairs with None values?

Updated

My question may not have been clear. And the question I posed was perhaps the wrong one.

My current a_it is using a lot of memory. I would like to optimize it.

I created 2 new versions of the dict based on the comprehensions above.

One includes the key-value pairs with None values; and the other ignores the key-value pairs completely.

I would like to know why they still use the same amount of memory. I accept that Python does not optimise key-value pairs with None values, because it can't.

But is there benefit to excluding the key value pairs for non-yet set values? My experiment did not prove this and I turned to the SO community for help.


Solution

  • No, it does not, and it can't. Consider that there is a difference between

    That is, simply put, "foo" in mydict will be True for {"foo": None}, and one needs to store that information.

    If you're thinking about "sparse" dictionaries (which really isn't the right term here), you may want to look at "slotted dataclasses"; a dataclass using __slots__ is usually much more memory efficient with respect to the key-set, as it has a known universe of possible keys.

    When thinking about the actual memory requirements of two similar sized dict with similar content, it's important to keep in mind that