python-3.xcachingnumpy-memmap

Avoiding unnecessary caching of data when using numpy memmap


I have a program that reads through very large (~100GB-TB) binary files in chunks using numpy memmap. The program does a single pass over the data, so there is no need to cache anything since there is never a need to go back and reread, but np.memmap by default caches data. As a result, RAM usage saturates quickly, even though there is no real need for it.

Is there a way to turn off caching of data, or, barring that, to manually clear the cache? I have found a few other threads on the topic that suggest that the only way is to flush the memmap, delete all reference to it, run the garbage collector, and recreate the memmap from scratch, which does work, but is obviously not ideal. Can I do better?

Here is a MWE that demonstrates the issue (note that it will create 2GB of random garbage on your HDD if you run it). As you can see, the RAM usage reflects the cumulative amount of data loaded, even if the chunk_size is small. Ideally, RAM usage would be limited to the amount of data contained in a single chunk.

import numpy as np
import os
import psutil
import gc
import time

# Parameters
filename = 'test_memmap.bin'
file_size_gb = 2  # Change this if needed
dtype = np.float32
element_size = np.dtype(dtype).itemsize
num_elements = (file_size_gb * 1024**3) // element_size
chunk_size = 1_000_000  # Number of elements to read at once

# Step 1: Create a large binary file
if not os.path.exists(filename):
    print("Creating file...")
    with open(filename, 'wb') as f:
        f.write(np.random.rand(num_elements).astype(dtype).tobytes())

# Step 2: Process file using memmap in chunks
print("Processing with memmap...")
mm = np.memmap(filename, dtype=dtype, mode='r')
process = psutil.Process(os.getpid())

for i in range(0, len(mm), chunk_size):
    chunk = mm[i:i+chunk_size]
    # Simulate processing
    chunk.sum()
    
    # Monitor RAM usage
    mem = process.memory_info().rss / (1024 ** 2)  # in MB
    print(f"Step {i // chunk_size + 1}, RAM usage: {mem:.2f} MB")
del mm
gc.collect()
time.sleep(5) #os takes a second to catch up
mem = process.memory_info().rss / (1024 ** 2)  # in MB
print(f"Final RAM usage after deleting memmap: {mem:.2f} MB")

Solution

  • Ideally, RAM usage would be limited to the amount of data contained in a single chunk.

    And it seems to be so. Using psutil to print system memory shows that the available memory is similar to cached memory throughout the process. That system cached memory is available for other processes.

    Testing with a file size similar to total memory and printing last 2 passes

    
    import numpy as np
    import os
    import psutil
    import gc
    
    # Parameters
    filename = '/home/lmc/tmp/test_memmap.bin'
    file_size_gb = 8  # Change this if needed
    dtype = np.float32
    element_size = np.dtype(dtype).itemsize
    num_elements = (1024**3) // element_size
    chunk_size = 1_000_000  # Number of elements to read at once
    
    # Step 1: Create a large binary file
    if not os.path.exists(filename):
        print("Creating file...")
        with open(filename, 'wb') as f:
            for i in range(file_size_gb):
                f.write(np.random.rand(num_elements).astype(dtype).tobytes())
    
    # Step 2: Process file using memmap in chunks
    print("Processing with memmap...")
    mm = np.memmap(filename, dtype=dtype, mode='r')
    process = psutil.Process(os.getpid())
    
    smem0 = psutil.virtual_memory()
    print(f"\t mem0 - free: {smem0.free/(1024 ** 2):.2f}, available: {smem0.available/(1024 ** 2):.2f}, cached: {smem0.cached/(1024 ** 2):.2f}")
    
    for i in range(0, len(mm), chunk_size):
        chunk = mm[i:i+chunk_size]
        # Simulate processing
        chunk.sum()
        
        # mem = process.memory_info().rss / (1024 ** 2)  # in MB
        # print(f"Step {i // chunk_size + 1} ({i}), RAM usage: {mem:.2f} MB")
        if i >= len(mm) - chunk_size * 2:
            # Monitor RAM usage
            mem = process.memory_info().rss / (1024 ** 2)  # in MB
            print(f"Step {i // chunk_size + 1}, RAM usage: {mem:.2f} MB, shared: {shr:.2f}")
            smem1 = psutil.virtual_memory()
            print(f"\t mem1 - free: {smem0.free/(1024 ** 2):.2f}, available: {smem1.available/(1024 ** 2):.2f}, cached: {smem1.cached/(1024 ** 2):.2f}")
    

    Result

    
    Processing with memmap...
             mem0 - free: 172.61, available: 5421.77, cached: 5867.20
    Step 2147, RAM usage: 5304.29 MB, shared: 5289.00
             mem1 - free: 172.61, available: 5450.10, cached: 5872.30
    Step 2148, RAM usage: 5303.04 MB, shared: 5287.75
             mem1 - free: 172.61, available: 5442.97, cached: 5878.12
    

    Use offset + shape to control memory usage

    Recreate the map reading the file in chunks with offset and shape parameters (needs careful maths of offset and shape) Reference

    import numpy as np
    import os
    import psutil
    import gc
    
    def monitor():
        # Monitor RAM usage
        mem = process.memory_info().rss / (1024 ** 2)  # in MB
        shr = process.memory_info().shared / (1024 ** 2)  # in MB
        print(f"Step {i // chunk_size + 1}, RAM usage: {mem:.2f} MB, shared: {shr:.2f}, offset: {offset}")
        smem1 = psutil.virtual_memory()
        print(f"\t mem1 - free: {smem0.free/(1024 ** 2):.2f}, available: {smem1.available/(1024 ** 2):.2f}, cached: {smem1.cached/(1024 ** 2):.2f}")
    
    
    # Parameters
    filename = '/home/lmc/tmp/test_memmap.bin'
    file_size_gb = 8  # Change this if needed
    dtype = np.float32
    element_size = np.dtype(dtype).itemsize
    num_elements = (1024**3) // element_size
    chunk_size = 1_000_000  # Number of elements to read at once
    
    # Step 1: Create a large binary file
    if not os.path.exists(filename):
        print("Creating file...")
        with open(filename, 'wb') as f:
            for i in range(file_size_gb):
                f.write(np.random.rand(num_elements).astype(dtype).tobytes())
    
    # Step 2: Process file using memmap in chunks
    print("Processing with memmap...")
    process = psutil.Process(os.getpid())
    
    smem0 = psutil.virtual_memory()
    print(f"\t mem0 - free: {smem0.free/(1024 ** 2):.2f}, available: {smem0.available/(1024 ** 2):.2f}, cached: {smem0.cached/(1024 ** 2):.2f}")
    
    offset = 0
    for i in range(0, file_size_gb * num_elements, chunk_size):
        if i // chunk_size  ==  int((file_size_gb * num_elements // chunk_size)) - 1:
            # last offset, do not pass a shape but read all remaining
            chunk = np.memmap(filename, dtype=dtype, mode='r', offset= offset)
            monitor()
            break
        else:
            chunk = np.memmap(filename, dtype=dtype, mode='r', shape=(chunk_size,), offset= offset)
        # Simulate processing
        chunk.sum()
    
        if i < chunk_size * 2 or i >= file_size_gb * num_elements - chunk_size * 3:
            monitor()
        offset = int((i + chunk_size) * 32/8)
    
    
    Processing with memmap...
             mem0 - free: 128.48, available: 5284.02, cached: 5815.98
    Step 1, RAM usage: 31.30 MB, shared: 17.75, offset: 0
             mem1 - free: 258.95, available: 5397.76, cached: 5824.90
    Step 2, RAM usage: 31.33 MB, shared: 17.78, offset: 4000000
             mem1 - free: 258.95, available: 5397.76, cached: 5824.90
    
    
    Step 2145, RAM usage: 24.31 MB, shared: 10.97, offset: 8580000000
             mem1 - free: 128.48, available: 5290.66, cached: 5822.28
    Step 2146, RAM usage: 24.31 MB, shared: 10.97, offset: 8584000000
             mem1 - free: 128.48, available: 5296.34, cached: 5796.22
    Step 2147, RAM usage: 20.59 MB, shared: 7.24, offset: 8588000000
             mem1 - free: 128.48, available: 5296.43, cached: 5796.22
    ``