Consider a dict with an int key and dict value, where the value features string keys and further dict values.
Now consider these 2 representations of this dict:
>>> a_it_1
{ 1: {"it": None, "ndar": {1:1}, 2: {"it": {2:8}, "ndar": None} }
>>> a_it_2
{ 1: { "ndar": {1:1}, 2: { "it": {2:8}} }
As you can see the inner dict is sparse; not all values appear in all keys.
a_it_2
is excluding key-value pairs that include None
values in an attempt to save memory.
This is a simple example, the actual dict contains more than 10,000 outer key-value pairs.
I am doing some debugging and can see that both these use the same amount of memory.
I am using:
round(asizeof.asizeof({...}) / (1024 * 1024), 2)
where the ...
represents the dict comprehension I am using to build the 2 dicts displayed above.
The asizeof
method is from the asizeof
module of Pympler.
I am including my dict comprehensions below.
Retaining the inner sparse key-value pairs but setting None values:
a_it_1 = {k: {p: vi if type(vi) == int or type(vi) == bool or len(vi) > 0 else None for p, vi in v.items() if vi is not None} for k, v in a_it.items() if k < 10000}
Removing the inner sparse key-value pairs completely:
a_it_2 = {k: {p: vi for p, vi in v.items() if vi is not None and (type(vi) == int or type(vi) == bool or len(vi) > 0)} for k, v in a_it.items() if k < 10000}
Taking the size of both these dicts using asizeof
returns the same value: 12.3 (megabytes) each.
I would have expected the lack of the key-value pairs in the first place, to use less memory.
Does Python internally optimize key-value pairs with None
values?
Updated
My question may not have been clear. And the question I posed was perhaps the wrong one.
My current a_it is using a lot of memory. I would like to optimize it.
I created 2 new versions of the dict based on the comprehensions above.
One includes the key-value pairs with None values; and the other ignores the key-value pairs completely.
I would like to know why they still use the same amount of memory. I accept that Python does not optimise key-value pairs with None values, because it can't.
But is there benefit to excluding the key value pairs for non-yet set values? My experiment did not prove this and I turned to the SO community for help.
No, it does not, and it can't. Consider that there is a difference between
{"foo": None}
){}
)That is, simply put, "foo" in mydict
will be True
for {"foo": None}
, and one needs to store that information.
If you're thinking about "sparse" dictionaries (which really isn't the right term here), you may want to look at "slotted dataclasses"; a dataclass
using __slots__
is usually much more memory efficient with respect to the key-set, as it has a known universe of possible keys.
When thinking about the actual memory requirements of two similar sized dict
with similar content, it's important to keep in mind that
dict
overcommits the key-space, in order to avoid having to rebalance the dict during frequent inserts. This means that two similar sized dictionaries (for some fuzzy "similar") will use the same amount of memory, erring on the upside.interned
, if possible. This is true to all string literals that are present in the code. Since all interned strings use the same memory backing, the memory required by a second dict
which can share a lot of the keys with the first dict only grows due to the str-objects themselves, which is very, very roughly pointer-sized.