pythontensorflowmachine-learningkerastf.data.dataset

TensorFlow does not accept list type for dataset generator


I am building a neural network. I couldn't load all the training data into memory at once, so I am using TensorFlow's tf.data.Dataset.from_generator function to load data incrementally. However, it throws an error saying it does not accept a list of tensors as a type.

TypeError: `output_signature` must contain objects that are subclass of 
`tf.TypeSpec` but found <class 'list'> which is not.

The input to my neural network is a list of 151 separate tensors. How can I represent this in the generator? My code is below:

def generator(file_paths, batch_size, files_per_batch, tam, value):
    return tf.data.Dataset.from_generator(
        lambda: data_generator(file_paths, batch_size, files_per_batch, tam, value),
        output_signature=(
            [tf.TensorSpec(shape=(batch_size, tam), dtype=tf.float32) for _ in range(tam+1)],  # Lista de 151 tensores
            tf.TensorSpec(shape=(batch_size, tam), dtype=tf.float32)  # Rótulos
        )
    )

inputArray = [Input(shape=(tam,)) for _ in range(tam + 1)]

train_dataset = generator(file_paths, batch_size, files_per_batch, tam, False)
train_dataset = train_dataset.prefetch(tf.data.AUTOTUNE)

model.fit(train_dataset, epochs=1000, validation_split=0.2, verbose=1)

I tried to use tf.data.Dataset.from_generator to feed data into my neural network in batches, since I can't load all the data into memory at once. However, I encountered an error:

TypeError: output_signature must contain objects that are subclass of tf.TypeSpec but found <class 'list'> which is not.

Solution

  • I solved the problem using a dictionary instead of a list.

    def generator(file_paths, batch_size, files_per_batch, size, value):
        return tf.data.Dataset.from_generator(
            lambda: data_generator(file_paths, batch_size, files_per_batch, size, value),
            output_signature=(
                {f"input_{i}": tf.TensorSpec(shape=(batch_size, size), dtype=tf.float32) for i in range(size + 1)},  # Inputs
                tf.TensorSpec(shape=(batch_size, size), dtype=tf.float32)  # Labels
            )
        )
    

    To achieve this, I adjusted the input layer to:

    inputArray = [Input(shape=(size,), name=f"input_{i}") for i in range(size + 1)]
    

    This adjustment ensures that the keys from the generator match the keys expected by the model at the input.