pythonmultiprocessingoptuna

Memory leak for Optuna trial with multiprocessing


The Background

I have a machine learning pipeline that consists of N boosted models (LGBMRegressor), each with identical hyperparameters. Each of the N LGBMRegressors is trained on a separate chunk of data. My current workstation has a lot of cores, so I multiprocess each regressor on a separate thread.

The Problem

I am trying to tune the parameters that go into the LGBMRegressors through optuna. When I use the multiprocessing inside an optuna trial, it has a memory leak and I run out of memory. Can I use multiprocessing inside an optuna trial and not run into a memory leak?

Minimal Reproducible Example

import optuna
import pandas as pd
import numpy as np
import multiprocessing
from lightgbm import LGBMRegressor

N = 500
n_cores = 30
rows_per_N = 1000
cols_per_N=50
data = [ [np.random.normal(size=(rows_per_N, cols_per_N)), np.random.normal(size=(rows_per_N, ))] for i in range(N)]

def get_metric(data):
    (X, y), params = data
    model =LGBMRegressor(**params)
    model.fit(X, y)
    return np.abs( model.predict(X) - y )


def objective(trial):
    param = {
        "n_jobs": "1",
        "num_leaves": trial.suggest_int("num_leaves", 2, 256)
    }
    lgb_params = [param for _ in range(N)]
    p = multiprocessing.Pool(n_cores)
    results = p.map(get_metric, zip(data,lgb_params))
    return np.mean(results)


study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=100)

Alternative Solutions

I have written the above code as a for loop and it has not had memory issues. The drawback here is that this is 30x slower than the multiprocessed solution.


Solution

  • As suggested by @J. M. Arnold, I think you can should use context manager for the pool to make sure it is closed to avoid potential memory leak. Additionally, from the documentation, you can avoid out of memory error by periodically running the garbage collector. You can do that by setting the parameter gc_after_trial to True in study.optimize method

    import optuna
    import pandas as pd
    import numpy as np
    import multiprocessing
    from lightgbm import LGBMRegressor
    
    N = 500
    n_cores = 30
    rows_per_N = 1000
    cols_per_N=50
    data = [ [np.random.normal(size=(rows_per_N, cols_per_N)), np.random.normal(size=(rows_per_N, ))] for i in range(N)]
    
    def get_metric(data):
        (X, y), params = data
        model =LGBMRegressor(**params)
        model.fit(X, y)
        return np.abs( model.predict(X) - y )
    
    
    def objective(trial):
        param = {
            "n_jobs": "1",
            "num_leaves": trial.suggest_int("num_leaves", 2, 256)
        }
        lgb_params = [param for _ in range(N)]
        with multiprocessing.pool.Pool(n_cores) as p:
            results = p.map(get_metric, zip(data,lgb_params))
            return np.mean(results)
    
    
    study = optuna.create_study(direction="minimize")
    study.optimize(objective, n_trials=100, gc_after_trial=True)