pythonmemorymultiprocessingpython-multiprocessingcopy-on-write

Multi-processing copy-on-write: 4 workers but only double the size


I'm running an experiment on copy-on-write mechanism on Python Multiprocessing.

I created a large file of 10GB and load the file into large_object in main.

file_path = 'dummy_large_file.bin'
    try:
        large_object = load_large_object(file_path)
        print("File loaded successfully")
    except ValueError as e:
        print(e)
    except Exception as e:
        print(f"An error occurred: {e}")
        # Create a dummy file of 10GB for testing
        dummy_file_path = 'dummy_large_file.bin'
        with open(dummy_file_path, 'wb') as dummy_file:
            dummy_file.seek(10 * 1024 * 1024 * 1024 - 1)
            dummy_file.write(b"A")

        print(f"Dummy file created at {dummy_file_path}")

I intentionally pass the large_object to the multiprocessing workers to see the workers copy the large object. And to make sure multiprocessing copy, I modify the content of the object.

# Multi-processing
    num_workers = 4
    chunk_size = 10
    processes = []
    for i in range(num_workers):
        start = i * chunk_size
        p = multiprocessing.Process(target=process_chunk, args=(large_object, dummy_file_path, start, chunk_size))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

# Multi-processing
def process_chunk(object, file_path, start, size):
    random_chunk = bytes([random.randint(0, 255) for _ in range(size)])
    object[start:start + size] = random_chunk  # Modify the chunk to a random binary number
    print(f"large object [{start}:{start + size}]: {object[start:start + size]}") # Process the chunk

The expectation here is the program should hold large_object of 10GB, and it copies the large_object to 4 other workers - which sums to 40GB. So the total memory should be 50GB.

However, I'm only seeing the total memory of 20GB.

Why there's only 20GB memory consumption?

Does python implement some other lazy loading or granular copy for large objects?


Solution

  • linux knows nothing about python or how big its objects are. the computer divides memory into pages, a memory page is a contiguous block of memory, linux uses 4KB memory pages on most systems, that's the "unit" of memory that could get duplicated with COW when you write to it, if you modify a single byte then the entire 4KB page gets duplicated

    the address space is virtualized at the page level, just because you mapped 10GB worth of pages doesn't mean the computer will allocate 10 GB contiguous physical memory, you only get 10 GB of contiguous virtual address space, those addresses may not even be mapped to physical memory at all until you write to it.

    each of your 4 workers only writes to 2.5 GB, therefore only 2.5 GB worth of pages are duplicated into each worker, equating to the 10 GB you see.

    I think the only optimization that python does is that it doesn't attempt to serialize/deserialize the objects when launched with multiprocessing.Process, as it will be COWed by linux anyway. this is not true if you use other multiprocessing methods like Pool which needs to serialize/deserialize its arguments and you'd have 50GB used.