I'm encountering an issue with PCA in sklearn
while using multiprocessing
. Specifically, the reconstruction error
in PCA
varies significantly based on the number of processes set in Pool
. For instance, using Pool(processes=4)
yields a small error (np.abs(tmp_matrix-X_train).max()<1e-2)
, but increasing to Pool(processes=5)
or higher results in a substantial error, with np.abs(tmp_matrix-X_train).max()
averaging around 10
for each column. This behavior is observed while using the Intel sklearnex
package.
I've tested various combinations and observed the following patterns:
20 cpu
+processes=1
, 80 cpu
+processes=1
, 80 cpu
+processes=4
, 120 cpu
+processes=5
80 cpu
+processes=5
, 100 cpu
+processes=5
, 120 cpu
+processes=5
(yes, 120+5 is unstable)Here's the relevant portion of my code:
from sklearnex import patch_sklearn
patch_sklearn()
from sklearn.decomposition import PCA
from functools import partial
from multiprocessing import Pool
def config_selection_single(df_entry: tuple, _some_arguments_indlucding_data_object):
#some pre-processing code
for some_iteration_condition:
# some data processing and transformation to bound data non-NaN and between [-1e20,1e20]
for another_iteration_condition:
z_mean = X[train_cond][:].mean()
z_std = X[train_cond][:].std()+1e-10
X_train = (X[train_cond][:]-z_mean) / z_std # X_train has shape ~ 2e4 X 50
pca = PCA(n_components=20, svd_solver='full')
p_model = pca.fit(X_train)
Q = p_model.transform(X_train)
tmp_matrix = p_model.inverse_transform(Q)
if not np.allclose(Q,X_train.dot(p_model.components_.transpose())): # to compute reconstruction error.
print("reconstruction error is huge!")
print(np.abs(tmp_matrix-X_train).max())
config_selection_prtial = partial(config_selection_single, _some_arguments_indlucding_data_object)
with Pool(processes=4) as pool: # 4 is good, 5 and 6 are bad
pool.map(config_selection_prtial, list(my_df.items()))
Unfortunately I could not find a small dataset demo that could reproduce the issue.
Any insights on why the number of processes affects PCA precision?
[To answer my own question] It turns out to be a bug from scikit-learn-intelex==2023.1.1
. When I udpate it to scikit-learn-intelex==2024.0.1
, the result looks good.