I am using keras-tuner in order to obtain the best set of hyperparameters for my model. I can reproduce my problem for a random dataset:
def generate_data(n_windows, n_timesteps):
feature_vector_list = []
label_list = []
for i in range(10):
x = tf.random.normal((n_windows, n_timesteps))
feature_vector = [x]
choices = [np.array([1, 0]), np.array([0, 1]),
np.array([0, 0]), np.array([1,1])]
labels = np.array([random.choice(choices) for i in range(n_windows)])
feature_vector_list.append(feature_vector)
label_list.append(labels)
return feature_vector_list, label_list
def custom_generator(feat_vector_list, label_list):
assert len(feat_vector_list) == len(label_list), \
"Number of feature vectors inconsistent with the number of labels"
counter = 0
while True:
feat_vec = feat_vector_list[counter]
list_labels = label_list[counter]
counter = (counter + 1) % len(feat_vector_list)
yield feat_vec, list_labels
Here is the model:
def model_builder(hp):
n_timesteps, n_features, n_outputs = 60, 1, 2
hp_units = hp.Int("units", min_value=50, max_value=500, step=50)
hp_filters = hp.Int("filters", 4, 32, step=4, default=8)
hp_kernel_size = hp.Int("kernel_size", 3, 50, step=1)
hp_pool_size = hp.Int("pool_size", 2, 8, step=1)
hp_dropout = hp.Float("dropout", 0.1, 0.5, step=0.1)
input1 = Input(shape=(n_timesteps, n_features))
conv1 = Conv1D(filters=hp_filters,
kernel_size=hp_kernel_size,
activation='relu')(input1)
drop1 = Dropout(hp_dropout)(conv1)
if hp.Choice("pooling", ["max", "avg"]) == "max":
pool1 = MaxPooling1D(pool_size=hp_pool_size)(drop1)
else:
pool1 = AveragePooling1D(pool_size=hp_pool_size)(drop1)
flatten1 = Flatten()(pool1)
# hidden layers
dense1 = Dense(hp_units, activation='relu')(flatten1)
outputs = Dense(n_outputs, activation='softmax')(dense1)
model = Model(inputs=[input1, input2], outputs=outputs)
model.compile(loss='categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(learning_rate=hp.Float("learning_rate",
0.01,
0.1,
step=0.2)),
metrics=['accuracy'])
return model
Here is the training script:
if __name__ == '__main__':
x_train, y_train = generate_data(350, 60)
x_val, y_val = generate_data(80, 60)
training_generator = custom_generator(x_train, y_train)
validation_generator = custom_generator(x_val, y_val)
tuner = kt.Hyperband(
model_builder,
objective="val_accuracy",
max_epochs=70,
factor=3,
directory="Results",
project_name="cnn_tunning"
)
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
patience=5,
min_delta=0.002)
tuner.search(
training_generator,
steps_per_epoch=N_WINDOWS,
validation_data=validation_generator,
validation_steps=75,
callbacks=[stop_early],
)
Now what I have found is that after the hyperband starts using a decent number of iterations and the callback I set up should come into play I get this error:
W tensorflow/core/framework/op_kernel.cc:1733] INVALID_ARGUMENT: ValueError: Could not find callback with key=pyfunc_530 in the registry.
Traceback (most recent call last):
File "/home/diogomota/.cache/pypoetry/virtualenvs/WUAle-Z1-py3.7/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 259, in __call__
raise ValueError(f"Could not find callback with key={token} in the "
ValueError: Could not find callback with key=pyfunc_530 in the registry.
W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: INVALID_ARGUMENT: ValueError: Could not find callback with key=pyfunc_530 in the registry.
Traceback (most recent call last):
File "/home/diogomota/.cache/pypoetry/virtualenvs/WUAle-Z1-py3.7/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 259, in __call__
raise ValueError(f"Could not find callback with key={token} in the "
ValueError: Could not find callback with key=pyfunc_530 in the registry.
However it just proceeds to the next trial so I'm not sure what is going on, can someone explain why it can't find the callback?
I'm using tensorflow 2.8
and keras-tuner 1.1.2
I could only find one place online with a similar issue, but no solution provided: https://issuemode.com/issues/tensorflow/tensorflow/72982126
EDIT:
.search()
. I do not know the reason for this being an issue. Regular training using the .fit()
works without any issuesLooking at the source code of the error, and reviewing the similar error provided, it looks like this issue is not due to the actual model callback (tf.keras.callbacks.EarlyStoppingCallback
). The error occurs in the FuncRegistry
class, which is a helper that maintains a map of unique tokens to registered python functions, and it looks like in both cases, the token (pyfunc_XXX
) does not map to a function. Functions are inserted here when _internal_py_func
is called, while wrapping of a Python function (to be executed as an eager Tensorflow operation) or while computing the gradient of an eager function. The global registry of tokens to functions (the FuncRegistry
object) is supplied to initialize_py_trampoline
, which is bound to the InitializePyTrampoline
function in C++ through PyBind, so the reference of token to function maps is maintained in the C++ runtime as well.
At that level, tracing the error to C++ source code from the logs, it's occurring in the destructor of the inner class Iterator
, a field of GeneratorDatasetOp
. The destructor is called when the object is out of scope or explicitly deleted - meaning it would be called when the generator is finished it's task, which sounds like it may be consistent with the observations you were making with when the error occurred.
In summary, without being able to probe much further without a dataset, it sounds like there may be a problem with the custom generator. I would recommend trying to perform the training without keras-tuner
and the same generator implementation, to identify if the problem is consistent with other observation linked, as they were not using keras-tuner
but they were using a custom generator. If the error persists, it would also be worth evaluating if previous releases (e.g; Tensorflow 2.7 or below) has the same problem with the generator. If it's consistently failing, it may warrant submitting an actual issue to the Tensorflow Github repository, as it may actually be a core bug which requires further exploration.
Also, if you don't need to use a generator (as in, the data can fit into memory), I would recommend trying to supply the dataset directly (calling fit
with a list of numpy arrays or a numpy array instead of supplying generator functions), as that path won't touch the DatasetGenerator
code which is currently failing, and that should not affect your hyperparameter search.
Thank you for the additional information and including code to replicate your generator functions. I was able to reproduce the issue in Python 3.7/Tensorflow 2.8/keras-tuner 1.1.2 on CPU. If you inspect the _funcs
(the field in the global registry which maintains a dictionary of tokens to weak references to functions), it's actually empty. Upon further inspection, it looks like every time a new trial is started, _funcs
is cleared and repopulated, which is consistent if keras-tuner is creating a new graph (model) every time (although the same registry FuncRegistry
is used throughout).
The error does not occur if the EarlyStopping
callback is omitted, so you were correct to say the error is linked to the callback. It also appears the error is non-deterministic, as the trial and epoch of the occurrence varies per run.
With the cause of the error narrowed down, another person experienced the same issue, and their observations were the cause of the error being related to explicitly setting the min_delta
parameter in the callback, as you are doing as well, which no other keras-tuner
example does (e.g; in this example and this example from the documentation, they only have monitor
and/or patience
set).
The impact of setting min_delta
in the EarlyStopping
callback, which is set to 0 by default, can be seen here. Specifically, _is_improvement
can evaluate to True less frequently when min_delta
is set to some non-zero value:
if self._is_improvement(current, self.best):
self.best = current
self.best_epoch = epoch
if self.restore_best_weights:
self.best_weights = self.model.get_weights()
# Only restart wait if we beat both the baseline and our previous best.
if self.baseline is None or self._is_improvement(current, self.baseline):
self.wait = 0
def _is_improvement(self, monitor_value, reference_value):
return self.monitor_op(monitor_value - self.min_delta, reference_value)
Note that in your case, self.monitor_op
is np.less
, since the metric you're monitoring is val_loss
:
if (self.monitor.endswith('acc') or self.monitor.endswith('accuracy') or
self.monitor.endswith('auc')):
self.monitor_op = np.greater
else:
self.monitor_op = np.less
When self._is_improvement
is evaluated less frequently, the patience
criterion (self.wait >= self.patience
) will be met more often, since self.wait
will reset less frequently (as self.baseline
is None by default):
if self.wait >= self.patience and epoch > 0:
self.stopped_epoch = epoch
self.model.stop_training = True
if self.restore_best_weights and self.best_weights is not None:
if self.verbose > 0:
io_utils.print_msg(
'Restoring model weights from the end of the best epoch: '
f'{self.best_epoch + 1}.')
self.model.set_weights(self.best_weights)
With this narrowed down, it appears to have something to do with the model stopping training more frequently, and references to operations in the graph not existing anymore when keras-tuner
is running a trial.
In simpler terms, it seems like a bug in keras-tuner
that needs to be submitted, which I did here with all the details from this response. For purposes of proceeding in the meantime, if the min_delta
criteria isn't necessary, I would suggest removing that parameter from EarlyStopping
and running the script again to see if you still hit the issue.
Thank you for the additional information. I was able to reproduce the successful run if the generator is not used, and it also looks like the other case I referenced was also using a generator in conjunction with EarlyStopping
with a min_delta
supplied.
Upon some further inspection, the function which is not found in the registry is finalize_py_func
, as in every token which causes that error maps to finalize_py_func
before _funcs
is cleared. finalize_py_func
is the inner function wrapped by script_ops.numpy_function
, which wraps a python function to be used as Tensorflow op. The function where finalize_py_func
is defined and returned as Tensorflow op, finalize_fn
, is supplied when constructing a generator, as can be seen here. Looking at the documentation of the finalize function in the generator here, it says "A TensorFlow function that will be called on the result of init_func` immediately before a C++ iterator over this dataset is destroyed.".
Overall, the error is related to the generator, not the min_delta parameter. While setting the min_delta
expedites how quickly the error occurs, it can happen even if the min_delta
is omitted if the patience
is lowered enough to force the early stopping callback to trigger more frequently. Using your example, if you set patience
to 1 and remove min_delta
, the error appears pretty quickly.
I have revised the github issue to include that detail. It looks like the error still exists in Tensorflow 2.7, but if you downgrade to Tensorflow 2.6 (and Keras 2.6), the error does not occur. If downgrading is possible, that may be the best option for proceeding until the issue is addressed.