tensorflowkeraskeras-cv

Why am I getting a concat error at the end of one epoch of training?


I'm relatively new to Keras, and I'm trying to get some example code from Keras documentation running in a jupyter notebook. This is the example I'm working with:

Keras Computer Vision Example

I copied the code over to my notebook, however when I train the model, it runs for one epoch. At the end of that epoch, I get an error, as shown below.

I'm not sure how to go about debugging this considering all my code is from the example.

`Epoch 1/3
1463/1463 [==============================] - ETA: 0s - loss: 22.8407 - box_loss: 2.6877 - class_loss: 20.1530
---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
<ipython-input-17-8e8737ecac83> in <cell line: 1>()
----> 1 yolo.fit(
      2     train_ds,
      3     validation_data=val_ds,
      4     epochs=3,
      5     callbacks=[EvaluateCOCOMetricsCallback(val_ds, "model.h5")],

2 frames
/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py in result_fn(self, force)
    208 
    209         def result_fn(self, force=False):
--> 210             py_func_result = tf.py_function(
    211                 self.result_on_host_cpu, inp=[force], Tout=obj.dtype
    212             )

UnknownError: {{function_node __wrapped__EagerPyFunc_Tin_1_Tout_1_device_/job:localhost/replica:0/task:0/device:CPU:0}} InvalidArgumentError: {{function_node __wrapped__ConcatV2_N_365_device_/job:localhost/replica:0/task:0/device:CPU:0}} ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [4,13,4] vs. shape[1] = [4,14,4] [Op:ConcatV2] name: concat
Traceback (most recent call last):

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    return func(device, token, args)

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/script_ops.py", line 146, in __call__
    outputs = self._call(device, args)

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/script_ops.py", line 153, in _call
    ret = self._func(*args)

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py", line 205, in result_on_host_cpu
    return tf.constant(obj_result(force), obj.dtype)

  File "/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py", line 256, in result
    self._cached_result = self._compute_result()

  File "/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py", line 264, in _compute_result
    _box_concat(self.ground_truths),

  File "/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py", line 44, in _box_concat
    result[key] = tf.concat([b[key] for b in boxes], axis=0)

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/ops.py", line 5883, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access

tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__ConcatV2_N_365_device_/job:localhost/replica:0/task:0/device:CPU:0}} ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [4,13,4] vs. shape[1] = [4,14,4] [Op:ConcatV2] name: concat`

I'm expecting the model to train for three epochs. I tried adjusting the training dataset so it was divisible by the batch size, but that didn't help.


Solution

  • I had the same problem and after some searching I found that EvaluateCOCOMetricsCallback() is the cause of this particular problem. As recommended in the link below, I switched to keras_cv.callbacks.PyCOCOCallback() and it fixed it for me.

    https://github.com/keras-team/keras-cv/issues/1994#issuecomment-1665896238