pythontensorflow

Unknown image file format. One of JPEG, PNG, GIF, BMP required


I built a simple CNN model and it raised below errors:

Epoch 1/10
235/235 [==============================] - ETA: 0s - loss: 540.2643 - accuracy: 0.4358
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-14-ab88232c98aa> in <module>()
     15     train_ds,
     16     validation_data=val_ds,
---> 17     epochs=epochs
     18 )

7 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

InvalidArgumentError:  Unknown image file format. One of JPEG, PNG, GIF, BMP required.
     [[{{node decode_image/DecodeImage}}]]
     [[IteratorGetNext]] [Op:__inference_test_function_2924]

Function call stack:
test_function

The code I wrote is quite simple and standard. Most of them are just directly copied from the official website. It raised this error before the first epoch finish. I am pretty sure that the images are all png files. The train folder does not contain anything like text, code, except imgages. I am using Colab. The version of tensorlfow is 2.5.0. Appreciate for any help.

data_dir = './train'

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir, 
    subset='training',
    validation_split=0.2,
    batch_size=batch_size,
    seed=42
)

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir, 
    subset='validation',
    validation_split=0.2,
    batch_size=batch_size,
    seed=42
)

model = Sequential([
    layers.InputLayer(input_shape=(image_size, image_size, 3)),
    layers.Conv2D(32, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(num_classes)
    ])

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(
    optimizer=optimizer,
    loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy'])

history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=epochs
)

Solution

  • Some of your files in the validation folder are not in the format accepted by Tensorflow ( JPEG, PNG, GIF, BMP), or may be corrupted. The extension of a file is indicative only, and does not enforce anything on the content of the file.

    You might be able to find the culprit using the imghdr module from the python standard library, and a simple loop.

    from pathlib import Path
    import imghdr
    
    data_dir = "/home/user/datasets/samples/"
    image_extensions = [".png", ".jpg"]  # add there all your images file extensions
    
    img_type_accepted_by_tf = ["bmp", "gif", "jpeg", "png"]
    for filepath in Path(data_dir).rglob("*"):
        if filepath.suffix.lower() in image_extensions:
            img_type = imghdr.what(filepath)
            if img_type is None:
                print(f"{filepath} is not an image")
            elif img_type not in img_type_accepted_by_tf:
                print(f"{filepath} is a {img_type}, not accepted by TensorFlow")
    

    This should print out whether you have files that are not images, or that are not what their extension says they are, and not accepted by TF. Then you can either get rid of them or convert them to a format that TensorFlow supports.