I'm training some data and always got this error in second fold. Here is some of the source code.
gc.collect()
sss = StratifiedShuffleSplit(n_splits=10, test_size=0.3, random_state=0)
fold_no = 1
annealer = LearningRateScheduler(lambda x: 1e-3 * 0.9 ** x)
callback2 = CustomEarlyStopping(patience=7)#100)
optimizer = keras.optimizers.Adam(learning_rate=1e-4)
acc_per_fold,loss_per_fold = [],[]
needTrain=True
for train_index, test_index in sss.split(X, y):
# if fold_no > 1:
clear_session()
gc.collect()
model = build_model(
X.shape,
numClass,
)
model.compile(loss = 'categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
nmModel = 'model_overlap_%d_%d_fold%d.h5'%(n_time_steps,step,fold_no)
print('------------------------------------------------------------------------')
print(f'Training for fold {fold_no} ...')
training_generator = BalancedDataGenerator(X[train_index],
y[train_index],
batch_size=256)
if needTrain:
history = model.fit(
training_generator,
epochs=1000,callbacks=[
callback2,
annealer
], verbose=1,
validation_data = (X[test_index],y[test_index]),
)
# model.save(nmModel)
if os.path.exists(nmModel):
os.remove(nmModel)
model.save(nmModel)
model.load_weights(nmModel)
scores = model.evaluate(X[test_index],y[test_index], verbose=0)
print(f'Score for fold {fold_no}: {model.metrics_names[0]} of {scores[0]}; {model.metrics_names[1]} of {scores[1]*100}%')
acc_per_fold.append(scores[1] * 100)
loss_per_fold.append(scores[0])
# Increase fold number
fold_no = fold_no + 1
del model
gc.collect()
the error occur when the program saving the model. here is the error massage:
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
exec(code, globals, locals)
File d:\tuh3salman\trainmodeloverlapseqbuku_all.py:298
model.save(nmModel)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\utils\traceback_utils.py:67 in error_handler
raise e.with_traceback(filtered_tb) from None
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\h5py\_hl\group.py:183 in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\h5py\_hl\dataset.py:163 in make_new_dset
dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl, dapl=dapl)
File h5py\_objects.pyx:54 in h5py._objects.with_phil.wrapper
File h5py\_objects.pyx:55 in h5py._objects.with_phil.wrapper
File h5py\h5d.pyx:138 in h5py.h5d.create
ValueError: Unable to create dataset (name already exists)
I've tried to downgrade or update some of library, hoping it will help, unfortunately still got the error. I also tried to delete the previous model or move it to another folder. here is some of my library version, maybe it will help to solve this problem: h5py 3.9.0, keras 2.8.0, tensorflow 2.8.0.
I want to solve this error, because I have been searching for the solution for few days but still got the error. It takes 12 hours for one fold to finish, it is wasting time just to find out it will be succeed or not.
The problem is the way you save the model. Instead of using nmModel.save, try to use nmModel.save_weight. It works for me