pythontensorflowobject-detectionobject-detection-apicvat

Is CVAT's TFRecord export incorrect?


ML newbie here...

I've been working on an object detection model for the last half of this semester for my university's enterprise program (like a school-sponsored business, sort of). I decided to use the TF Object Detection API, as it seemed to abstract a lot of the machine-learning details away and gave me the option to cross-train robust models. While I hate not knowing the specifics of what's going on with my code, it wasn't exactly feasible to get an in-depth understanding of machine learning and TF in the timeframe I was given.

I annotated all of my images using CVAT's web version because it advertised automatic exports. I have been following along with Nicholas Renotte's object detection video on YouTube. Everything has been going relatively smoothly until I attempted to begin the training, and was met with the following error.

Traceback (most recent call last):
  File "/home/blake/TFOD/GM/Tensorflow/models/research/object_detection/model_main_tf2.py", line 114, in <module>
    tf.compat.v1.app.run()
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/tensorflow/python/platform/app.py", line 36, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/blake/TFOD/GM/Tensorflow/models/research/object_detection/model_main_tf2.py", line 105, in main
    model_lib_v2.train_loop(
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/object_detection/model_lib_v2.py", line 605, in train_loop
    load_fine_tune_checkpoint(
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/object_detection/model_lib_v2.py", line 401, in load_fine_tune_checkpoint
    _ensure_model_is_built(model, input_dataset, unpad_groundtruth_tensors)
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/object_detection/model_lib_v2.py", line 161, in _ensure_model_is_built
    features, labels = iter(input_dataset).next()
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/tensorflow/python/distribute/input_lib.py", line 570, in next
    return self.__next__()
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/tensorflow/python/distribute/input_lib.py", line 574, in __next__
    return self.get_next()
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/tensorflow/python/distribute/input_lib.py", line 631, in get_next
    return self._get_next_no_partial_batch_handling(name)
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/tensorflow/python/distribute/input_lib.py", line 663, in _get_next_no_partial_batch_handling
    replicas.extend(self._iterators[i].get_next_as_list(new_name))
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/tensorflow/python/distribute/input_lib.py", line 1633, in get_next_as_list
    return self._format_data_list_with_options(self._iterator.get_next())
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py", line 554, in get_next
    result.append(self._device_iterators[i].get_next())
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 850, in get_next
    return self._next_internal()
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 780, in _next_internal
    ret = gen_dataset_ops.iterator_get_next(
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 3016, in iterator_get_next
    _ops.raise_from_not_ok_status(e, name)
  File "/home/blake/miniconda3/envs/tf/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 7262, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__IteratorGetNext_output_types_16_device_/job:localhost/replica:0/task:0/device:GPU:0}} Input is empty.
     [[{{function_node case_cond_cond_jpeg_false_220}}{{node case/cond/cond_jpeg/decode_image/DecodeImage}}]]
     [[MultiDeviceIteratorGetNextFromShard]]
     [[RemoteCall]] [Op:IteratorGetNext]

I am running TF 2.12.0 on Ubuntu 22.04.1 on WSL 2.

From my understanding, this error is indicating that there is something wrong with my TFRecords. This seems likely, since the export from CVAT was only 125 KB for my Train dataset (~300 images) and 13 KB for my Test dataset (~25 images with 4 objects each). I thought a TFRecord contained image and annotation data in a binary for TF's use, so both of these file sizes seem ridiculously small.

Is there something wrong with CVAT's TFRecord export, or am I misunderstanding the use of the export or of TFRecords?

If there is something wrong with CVAT's export, is it feasible to export in a different format and convert to a TFRecord? Nicholas Renotte's video recommends using LabelImg, and he provides a script to convert LabelImg's output to a TFRecord. Is following that a better option? Of course, I would prefer not to re-annotate my images. Any guidance or nudge in the right direction would be greatly appreciated.


Solution

  • I went with my assumption that CVAT's TFRecord export was broken.

    I ended up exporting to PASCAL on CVAT instead, as that is the .XML format that is used by LabelImg. I then created a train directory and pasted all training images and their respective .XML files. The same was done for a test directory.

    I used this repo to create TFRecords from the image and .XML files. The only modification made was on lines 85-91. The int() cast was changed to a float() cast because CVAT created floating point coordinates. LabelImg creates Integer coordinates.