I have a program that reads through very large (~100GB-TB) binary files in chunks using numpy memmap
. The program does a single pass over the data, so there is no need to cache anything since there is never a need to go back and reread, but np.memmap
by default caches data. As a result, RAM usage saturates quickly, even though there is no real need for it.
Is there a way to turn off caching of data, or, barring that, to manually clear the cache? I have found a few other threads on the topic that suggest that the only way is to flush
the memmap
, delete all reference to it, run the garbage collector, and recreate the memmap
from scratch, which does work, but is obviously not ideal. Can I do better?
Here is a MWE that demonstrates the issue (note that it will create 2GB of random garbage on your HDD if you run it). As you can see, the RAM usage reflects the cumulative amount of data loaded, even if the chunk_size
is small. Ideally, RAM usage would be limited to the amount of data contained in a single chunk.
import numpy as np
import os
import psutil
import gc
import time
# Parameters
filename = 'test_memmap.bin'
file_size_gb = 2 # Change this if needed
dtype = np.float32
element_size = np.dtype(dtype).itemsize
num_elements = (file_size_gb * 1024**3) // element_size
chunk_size = 1_000_000 # Number of elements to read at once
# Step 1: Create a large binary file
if not os.path.exists(filename):
print("Creating file...")
with open(filename, 'wb') as f:
f.write(np.random.rand(num_elements).astype(dtype).tobytes())
# Step 2: Process file using memmap in chunks
print("Processing with memmap...")
mm = np.memmap(filename, dtype=dtype, mode='r')
process = psutil.Process(os.getpid())
for i in range(0, len(mm), chunk_size):
chunk = mm[i:i+chunk_size]
# Simulate processing
chunk.sum()
# Monitor RAM usage
mem = process.memory_info().rss / (1024 ** 2) # in MB
print(f"Step {i // chunk_size + 1}, RAM usage: {mem:.2f} MB")
del mm
gc.collect()
time.sleep(5) #os takes a second to catch up
mem = process.memory_info().rss / (1024 ** 2) # in MB
print(f"Final RAM usage after deleting memmap: {mem:.2f} MB")
Ideally, RAM usage would be limited to the amount of data contained in a single chunk.
And it seems to be so. Using psutil
to print system memory shows that the available memory is similar to cached memory throughout the process.
That system cached memory is available for other processes.
Testing with a file size similar to total memory and printing last 2 passes
import numpy as np
import os
import psutil
import gc
# Parameters
filename = '/home/lmc/tmp/test_memmap.bin'
file_size_gb = 8 # Change this if needed
dtype = np.float32
element_size = np.dtype(dtype).itemsize
num_elements = (1024**3) // element_size
chunk_size = 1_000_000 # Number of elements to read at once
# Step 1: Create a large binary file
if not os.path.exists(filename):
print("Creating file...")
with open(filename, 'wb') as f:
for i in range(file_size_gb):
f.write(np.random.rand(num_elements).astype(dtype).tobytes())
# Step 2: Process file using memmap in chunks
print("Processing with memmap...")
mm = np.memmap(filename, dtype=dtype, mode='r')
process = psutil.Process(os.getpid())
smem0 = psutil.virtual_memory()
print(f"\t mem0 - free: {smem0.free/(1024 ** 2):.2f}, available: {smem0.available/(1024 ** 2):.2f}, cached: {smem0.cached/(1024 ** 2):.2f}")
for i in range(0, len(mm), chunk_size):
chunk = mm[i:i+chunk_size]
# Simulate processing
chunk.sum()
# mem = process.memory_info().rss / (1024 ** 2) # in MB
# print(f"Step {i // chunk_size + 1} ({i}), RAM usage: {mem:.2f} MB")
if i >= len(mm) - chunk_size * 2:
# Monitor RAM usage
mem = process.memory_info().rss / (1024 ** 2) # in MB
print(f"Step {i // chunk_size + 1}, RAM usage: {mem:.2f} MB, shared: {shr:.2f}")
smem1 = psutil.virtual_memory()
print(f"\t mem1 - free: {smem0.free/(1024 ** 2):.2f}, available: {smem1.available/(1024 ** 2):.2f}, cached: {smem1.cached/(1024 ** 2):.2f}")
Result
Processing with memmap...
mem0 - free: 172.61, available: 5421.77, cached: 5867.20
Step 2147, RAM usage: 5304.29 MB, shared: 5289.00
mem1 - free: 172.61, available: 5450.10, cached: 5872.30
Step 2148, RAM usage: 5303.04 MB, shared: 5287.75
mem1 - free: 172.61, available: 5442.97, cached: 5878.12
Recreate the map reading the file in chunks with offset
and shape
parameters
(needs careful maths of offset and shape)
Reference
import numpy as np
import os
import psutil
import gc
def monitor():
# Monitor RAM usage
mem = process.memory_info().rss / (1024 ** 2) # in MB
shr = process.memory_info().shared / (1024 ** 2) # in MB
print(f"Step {i // chunk_size + 1}, RAM usage: {mem:.2f} MB, shared: {shr:.2f}, offset: {offset}")
smem1 = psutil.virtual_memory()
print(f"\t mem1 - free: {smem0.free/(1024 ** 2):.2f}, available: {smem1.available/(1024 ** 2):.2f}, cached: {smem1.cached/(1024 ** 2):.2f}")
# Parameters
filename = '/home/lmc/tmp/test_memmap.bin'
file_size_gb = 8 # Change this if needed
dtype = np.float32
element_size = np.dtype(dtype).itemsize
num_elements = (1024**3) // element_size
chunk_size = 1_000_000 # Number of elements to read at once
# Step 1: Create a large binary file
if not os.path.exists(filename):
print("Creating file...")
with open(filename, 'wb') as f:
for i in range(file_size_gb):
f.write(np.random.rand(num_elements).astype(dtype).tobytes())
# Step 2: Process file using memmap in chunks
print("Processing with memmap...")
process = psutil.Process(os.getpid())
smem0 = psutil.virtual_memory()
print(f"\t mem0 - free: {smem0.free/(1024 ** 2):.2f}, available: {smem0.available/(1024 ** 2):.2f}, cached: {smem0.cached/(1024 ** 2):.2f}")
offset = 0
for i in range(0, file_size_gb * num_elements, chunk_size):
if i // chunk_size == int((file_size_gb * num_elements // chunk_size)) - 1:
# last offset, do not pass a shape but read all remaining
chunk = np.memmap(filename, dtype=dtype, mode='r', offset= offset)
monitor()
break
else:
chunk = np.memmap(filename, dtype=dtype, mode='r', shape=(chunk_size,), offset= offset)
# Simulate processing
chunk.sum()
if i < chunk_size * 2 or i >= file_size_gb * num_elements - chunk_size * 3:
monitor()
offset = int((i + chunk_size) * 32/8)
Processing with memmap...
mem0 - free: 128.48, available: 5284.02, cached: 5815.98
Step 1, RAM usage: 31.30 MB, shared: 17.75, offset: 0
mem1 - free: 258.95, available: 5397.76, cached: 5824.90
Step 2, RAM usage: 31.33 MB, shared: 17.78, offset: 4000000
mem1 - free: 258.95, available: 5397.76, cached: 5824.90
Step 2145, RAM usage: 24.31 MB, shared: 10.97, offset: 8580000000
mem1 - free: 128.48, available: 5290.66, cached: 5822.28
Step 2146, RAM usage: 24.31 MB, shared: 10.97, offset: 8584000000
mem1 - free: 128.48, available: 5296.34, cached: 5796.22
Step 2147, RAM usage: 20.59 MB, shared: 7.24, offset: 8588000000
mem1 - free: 128.48, available: 5296.43, cached: 5796.22
``