I have written a simple example to illustrate what exactly I'm banging my head onto. Probably there is some very simple explanaition that I just miss.
import time
import multiprocessing as mp
import os
class SomeOtherClass:
def __init__(self):
self.a = 'b'
class SomeProcessor(mp.Process):
def __init__(self, queue):
super().__init__()
self.queue = queue
def run(self):
soc = SomeOtherClass()
print("PID: ", os.getpid())
print(soc)
if __name__ == "__main__":
queue = mp.Queue()
for n in range(10):
queue.put(n)
processes = []
for proc in range(mp.cpu_count()):
p = SomeProcessor(queue)
p.start()
processes.append(p)
for p in processes:
p.join()
Result is:
PID: 11853
<__main__.SomeOtherClass object at 0x7fa637d3f588>
PID: 11854
<__main__.SomeOtherClass object at 0x7fa637d3f588>
PID: 11855
<__main__.SomeOtherClass object at 0x7fa637d3f588>
PID: 11856
<__main__.SomeOtherClass object at 0x7fa637d3f588>
Object address is the same for all, regardless every initialization happened in a new process. Can anyone point out what's the problem. Thanks.
Also I wonder about this behaviour, when I first initialize the same object in the main process then cache some values on it and then initialize the same object on every process. Then the processes inherit the main process object.
import time
import multiprocessing as mp
import os
import random
class SomeOtherClass:
c = {}
def get(self, a):
if a in self.c:
print('Retrieved cached value ...')
return self.c[a]
b = random.randint(1,999)
self.c[a] = b
return b
class SomeProcessor(mp.Process):
def __init__(self, queue):
super().__init__()
self.queue = queue
def run(self):
pid = os.getpid()
soc = SomeOtherClass()
val = soc.get('new')
print("Value from process {0} is {1}".format(pid, val))
if __name__ == "__main__":
queue = mp.Queue()
for n in range(10):
queue.put(n)
pid = os.getpid()
soc = SomeOtherClass()
val = soc.get('new')
print("Value from main process {0} is {1}".format(pid, val))
processes = []
for proc in range(mp.cpu_count()):
p = SomeProcessor(queue)
p.start()
processes.append(p)
for p in processes:
p.join()
Output here is :
Value from main process 13052 is 676
Retrieved cached value ...
Value from process 13054 is 676
Retrieved cached value ...
Value from process 13056 is 676
Retrieved cached value ...
Value from process 13057 is 676
Retrieved cached value ...
Value from process 13055 is 676
To expand on the comments and discussion:
multiprocessing
defaults to the fork
start method. Forking a process means child processes will share a copy-on-write version of the parent process's data. This is why the globally created objects have the same address in the subprocesses.
spawn
– no objects are shared in that case.class SomeClass:
container = {}
is class-level, not instance-level and will be shared between all instances of SomeClass
. That is,
a = SomeClass()
b = SomeClass()
print(a is b) # False
print(a.container is b.container is SomeClass.container) # True
a.container["x"] = True
print("x" in b.container) # True
print("x" in SomeClass.container) # True
By virtue of the class's state being forked into the subprocess, the shared container
also seems shared. However, writing into the container in a subprocess will not appear in the parent or sibling processes. Only certain special multiprocessing
types (and certain lower-level primitives) can span process boundaries.container
between instances and processes, it will need to be instance-level:
class SomeClass:
def __init__(self):
self.container = {}
(However, of course, if a SomeClass
is globally instantiated, and a process is forked, its state at the time of the fork will be available in subprocesses.)