[SOLVED] Tensorflow dataset splitted sizing parameter problem: Local rendezvous is aborting with status: OUT_OF

Tensorflow dataset splitted sizing parameter problem: Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence

Pretty new with data generator and dataset from tensorflow. I struggle with sizing batch, epochs and step... I can't figure the good set up to remove error "Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence"

I try using size of a chunck of my data called by data generator and try with complete size of all my dataset, and size of splited dataset but no one seem to work.

Here a simplify code of my last try

def data_generator(df, chunk_size):

  total_number_sample = 10000
  
  for start_idx in range(1, total_number_sample , chunk_size):
    
    end_idx = start_idx + chunk_size-1 

    df_subset = df.where(col('idx').between(start_idx, end_idx))

    feature = np.array(df_subset.select("vector_features_scaled").rdd.map(lambda row: row[0].toArray()).collect())
    label = df_subset.select("ptype_s_l_m_v").toPandas().values.flatten()

    yield feature, label

dataset = tf.data.Dataset.from_generator(
    lambda: data_generator(df, chunk_size),
    output_signature=(
        tf.TensorSpec(shape=(None, 24), dtype=tf.float32),
        tf.TensorSpec(shape=(None, 4), dtype=tf.float32)
    ))

I split and batch my data this way for trainning/validation

batch_sz = 100
split_ratio = .9
split_size = math.floor((chunk_size*10) * split_ratio) 

train_dataset = dataset.take(split_size).batch(batch_sz)
train_dataset = train_dataset.prefetch(tf.data.experimental.AUTOTUNE)
test_dataset = dataset.skip(split_size).batch(batch_sz)
test_dataset = test_dataset.prefetch(tf.data.experimental.AUTOTUNE)


steps_per_epoch=math.ceil(10000 * split_ratio) / batch_sz)
validation_steps=math.ceil((10000-split_size)) / batch_sz)

model.fit(train_dataset, 
          steps_per_epoch=steps_per_epoch, 
          epochs=3, 
          validation_data=test_dataset, 
          validation_steps=validation_steps,
          verbose=2)

results = model.evaluate(dataset.batch(batch_sz))

without batching all work great (model.fit() and model.evaluate())

but when I use batch I got this error:

W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
     [[{{node IteratorGetNext}}]]
/usr/lib/python3.11/contextlib.py:155: UserWarning: Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches. You may need to use the `.repeat()` function when building your dataset.
  self.gen.throw(typ, value, traceback)

I see lot of tread about steps_per_epoch epoch and batch size but I'm not finding a solution while apply on splitted data.

Solution

I finally found the problem.

Tensorflow dataset are a kind of data generator so we don't need to use a data generator to chunk dataset and pass it to tensorflow dataset.

Use .batch() to generated "chunk" of data to read by iteration.