python scipy sparse-matrix shared-memory symmetric

Easy way to store SciPy sparse symmetric matrix in a local file, load it into shared memory (shm), and reconstruct it from shm?

Easy way to store the upper diagonal (including the diagonal) of a SciPy sparse matrix in a local file, load the file into shared memory (shm), and reconstruct the sparse matrix from the shm?

ChatGPT 4o gave what seems like an overly complex solution, with three parts: data, indices, and indptr; is that necessary?

import numpy as np
import scipy.sparse as sp
from multiprocessing import shared_memory

# Example matrix creation
matrix = sp.random(100, 100, density=0.1, format='csr')

# Step 1: Extract and save the upper triangular part (including the diagonal)
upper_triangular_matrix = sp.triu(matrix)
sp.save_npz('upper_triangular_matrix.npz', upper_triangular_matrix)

# Step 2: Load the matrix from the file
loaded_matrix = sp.load_npz('upper_triangular_matrix.npz')

# Convert the matrix to shared memory
# Flatten the data, indices, and indptr arrays for shared memory storage
data = loaded_matrix.data
indices = loaded_matrix.indices
indptr = loaded_matrix.indptr

# Create shared memory blocks for each array
shm_data = shared_memory.SharedMemory(create=True, size=data.nbytes)
shm_indices = shared_memory.SharedMemory(create=True, size=indices.nbytes)
shm_indptr = shared_memory.SharedMemory(create=True, size=indptr.nbytes)

# Copy the arrays into the shared memory blocks
np.copyto(np.ndarray(data.shape, dtype=data.dtype, buffer=shm_data.buf), data)
np.copyto(np.ndarray(indices.shape, dtype=indices.dtype, buffer=shm_indices.buf), indices)
np.copyto(np.ndarray(indptr.shape, dtype=indptr.dtype, buffer=shm_indptr.buf), indptr)

# Access the shared memory blocks
shared_data = np.ndarray(data.shape, dtype=data.dtype, buffer=shm_data.buf)
shared_indices = np.ndarray(indices.shape, dtype=indices.dtype, buffer=shm_indices.buf)
shared_indptr = np.ndarray(indptr.shape, dtype=indptr.dtype, buffer=shm_indptr.buf)

# Create a shared memory CSR matrix
shared_csr_matrix = sp.csr_matrix((shared_data, shared_indices, shared_indptr), shape=loaded_matrix.shape)

# Accessing the shared memory matrix
print(shared_csr_matrix)

# Clean up shared memory
shm_data.close()
shm_data.unlink()
shm_indices.close()
shm_indices.unlink()
shm_indptr.close()
shm_indptr.unlink()

Solution

It ended up that my matrix was not really sparse, so using SciPy's sparse matrix format didn't really help save memory. The main property of my matrix is that it's symmetric.

ChatGPT 4o gives these basic steps for dealing with a symmetric matrix as though it is a full matrix, in order to save memory; it's basically what I'm asking for:

Step 1: Save the Upper-Diagonal Matrix to a File

First, you generate and save the upper-diagonal part of a symmetric matrix if it hasn't been saved yet.
import numpy as np
import os
from multiprocessing import shared_memory

filename = "upper_diag.npy"

# Function to create a symmetric matrix and save the upper-diagonal part
def create_and_save_upper_diag(filename, size=5):
    A = np.random.rand(size, size)
    A = (A + A.T) / 2  # Make A symmetric
    upper_diag_A = np.triu(A)
    np.save(filename, upper_diag_A)  # Save to a file

# Check if the file exists, if not create and save the matrix
if not os.path.exists(filename):
    create_and_save_upper_diag(filename)
Step 2: Load from File into SHM

Once you have the file, you can load it into shared memory.
# Function to load a matrix from a .npy file into shared memory
def load_to_shm(filename):
    # Load matrix from file
    upper_diag_A = np.load(filename)

    # Create shared memory block
    shm = shared_memory.SharedMemory(create=True, size=upper_diag_A.nbytes)

    # Create a NumPy array backed by shared memory
    upper_diag_A_shm = np.ndarray(upper_diag_A.shape, dtype=upper_diag_A.dtype, buffer=shm.buf)
    upper_diag_A_shm[:] = upper_diag_A[:]  # Copy data to shared memory

    return upper_diag_A_shm, shm

# Load the matrix into shared memory
upper_diag_A_shm, shm = load_to_shm(filename)
Example of Accessing the Matrix

You can access the matrix as discussed earlier using the symmetry of the matrix.
def get_element(matrix, i, j):
    if i <= j:
        return matrix[i, j]
    else:
        return matrix[j, i]

# Example of accessing elements
print(get_element(upper_diag_A_shm, 1, 4))  # Access an element
Cleaning Up

When you are done using the shared memory, it’s important to clean it up properly.
# Cleanup
upper_diag_A_shm = None
shm.close()
shm.unlink()  # Remove the shared memory block
This method combines file operations with shared memory handling in Python, making it suitable for applications where matrices are reused across different runs or processes, potentially saving time and memory by avoiding recomputation.