This question is derived from here.
I have three large lists containing python objects (l1
, l2
and l3
). These lists are created when the program starts and they take total of 16GB of RAM. The program will be used on linux exclusively.
I do not need to modify these lists or the objects in these lists in any way or form after they are created. They must remain in memory until the program exits.
I am using os.fork() and multiprocessing module in my program to spawn multiple sub-processes (up to 20 currently). Each of these sub-processes needs to be able to read the three lists (l1
, l2
and l3
).
My program is otherwise working fine and quite fast. However i am having problems with memory consumption. I was hoping that each sub-process can use the three lists without copying them in memory due to the copy-on-write approach on Linux. However this is not the case as referencing any object in any of these lists will increase the associated ref counts and therefore causes the entire page of memory to be copied.
So my question would be:
Can i disable the reference counting on l1
, l2
and l3
and all of the objects in these lists? Basically making the entire object (including meta-data such as ref count) read-only, so that it will never be modified under any circumstances (this, i assume, would allow me to take advantage of copy-on-write).
Currently i fear that i am forced to move to another programming language to accomplish this task because of a "feature" (ref counting) that i do not need currently, but what is still forced upon me and causing unnecessary problems.
You can't, reference counting is fundamental to CPython (the reference implementation, and the one you are using). Using methods on objects cause reference counts to change, item subscription or attribute access causes objects to be added and removed from the stack, which uses reference counts, etc. You cannot get around this.
And if the contents of the lists don't change, use tuple()
s instead. That won't change the fact that they'll be refcounted though.
Other implementations of Python (Jython (using the Java virtual machine), IronPython (a .NET runtime language) or PyPy (Python implemented in Python, but experimenting with JIT and other compiler techniques) are free to use different methods of memory management, and may or may not solve your memory problem.