Here I have total of 1000+ datasets on which i have to train same number of models and save them in a folder called models.
This code is working very well and I'm getting what I want. Only issue I'm facing is around 554th model, it is giving me this error.
No valid model found in run history. This means smac was not able to fit a valid model.
Please check the log file for errors.
Am I doing anything wrong here?
My code:
from joblib import Parallel, delayed
from sklearn.svm import LinearSVC
import numpy as np
import pandas as pd
import autosklearn.regression
import pickle
import timeit
import os
import warnings
warnings.filterwarnings("ignore")
def train_model(filename):
print('Reading Dataset: '+str(filename))
data = pd.read_csv(filename)
train_data = data[data['state'] == 'done']
automl = autosklearn.regression.AutoSklearnRegressor(
time_left_for_this_task=30,
metric=autosklearn.metrics.r2,
memory_limit=None
)
X_train = train_data[['feature1','feature2']]
y_train = train_data[['target_column']]
print("Training Started: "+str(filename))
automl.fit(X_train, y_train)
print('Saving Model: '+str(filename))
model_path = 'models/'+str(filename.split('.')[0])
if not os.path.exists(model_path):
os.makedirs(model_path)
model_filename = model_path+'/finalized_model.sav'
pickle.dump(automl, open(model_filename, 'wb'))
return True
if __name__ == "__main__":
start = timeit.default_timer()
result = Parallel(n_jobs=4)(delayed(train_model)(filename) for filename in ['dataset_1.csv', 'dataset_2.csv', 'dataset_3.csv',..., 'dataset_n.csv'])
stop = timeit.default_timer()
print('Time: ', (stop - start)/60, 'Minutes')
I found the cause of the issue. This is because of the less memory left in RAM.
I didn't get any documentation regarding this
But I checked the RAM utilisation continuously while running the script and when there is no memory left, the script terminated with the error above.
If anyone have more information regarding this. Their contribution will be more helpful for the community.