python-3.xtensorflowbatch-processingtensorflow-datasetsnotimplementedexception

Tensorflow load dataset: UnimplementedError: Append(absl::Cord) is not implemented [Op:TakeDataset]


I am trying to extract batches from my Tensorflow dataset using Tensorflow 2.4, and I get a very strange error:

--> 221         for batch, (input_seq, target_seq_in, target_seq_out) in enumerate(dataset.take(-1)):
    222             # Train and get the loss value
    223             loss, accuracy = train_step(input_seq, target_seq_in, target_seq_out, en_initial_states, optimizer)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py in take(self, count)
   1417       Dataset: A `Dataset`.
   1418     """
-> 1419     return TakeDataset(self, count)
   1420 
   1421   def skip(self, count):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py in __init__(self, input_dataset, count)
   3856         input_dataset._variant_tensor,  # pylint: disable=protected-access
   3857         count=self._count,
-> 3858         **self._flat_structure)
   3859     super(TakeDataset, self).__init__(input_dataset, variant_tensor)
   3860 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_dataset_ops.py in take_dataset(input_dataset, count, output_types, output_shapes, name)
   6608       return _result
   6609     except _core._NotOkStatusException as e:
-> 6610       _ops.raise_from_not_ok_status(e, name)
   6611     except _core._FallbackException:
   6612       pass

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
   6860   message = e.message + (" name: " + name if name is not None else "")
   6861   # pylint: disable=protected-access
-> 6862   six.raise_from(core._status_to_exception(e.code, message), None)
   6863   # pylint: enable=protected-access
   6864 

/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

UnimplementedError: Append(absl::Cord) is not implemented [Op:TakeDataset]

My process is as following:

dataset = tf.data.Dataset.from_tensor_slices((encoder_inputs, decoder_inputs, decoder_targets))
dataset = dataset.batch(batch_size, drop_remainder=True)

tf.data.experimental.save(dataset, save_path + 'dataset_' + str(index))
...

dataset = tf.data.experimental.load(folder_path +'dataset_'+str(index), (tf.TensorSpec(shape=(MAX_LEN,), dtype=tf.int64, name=None), tf.TensorSpec(shape=(MAX_LEN,), dtype=tf.int64, name=None), tf.TensorSpec(shape=(MAX_LEN,), dtype=tf.int64, name=None)))

I don't understand where could this error come from and wasn't able to find anything related.


Solution

  • Your stack trace seems to be missing the actual line that triggered the error but I am gonna try to guess anyway.

    The error seems related to dataset writing to a file that already exists and then it tries to append to it but whatever it uses as a WritableFile did not override Append (see: https://github.com/tensorflow/tensorflow/blob/516ae286f6cc796e646d14671d94959b129130a4/tensorflow/core/platform/file_system.h#L783)

    To continue with the wild guess - if this line:

    tf.data.experimental.save(dataset, save_path + 'dataset_' + str(index))
    

    is triggering the error, try something silly like - changing the file name.