[SOLVED] Dogbreed classification/CNN

Dogbreed classification/CNN

I am struggling with training a CNN model to identify dogbreeds. I intend to train the Stanford Dogs Dataset using ResNet architecture. I downloaded the dataset from http://vision.stanford.edu/aditya86/ImageNetDogs/ into google-colab notebook and have extracted the images in the dataset. I get a folder structure like this: folder_structure. I know I need the folder structure which has subfolders train and test and then further subfolders with images of dogs with corresponding species. How do I go along doing that?

Solution

You don't need to strictly create separate folders for train and test. You can use the method tf.keras.utils.image_dataset_from_directory from tensorflow. It lets you load your all-in-one-folder dataset taking the right split while loading. This is how:

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    "/images/",  # path to your data folder
    validation_split=0.2,  # percentage reserved for test
    subset="training",  # this dataset is for training
    seed=1024  # must be the same for both train and test: ensures that you take the images consistently 
)
test_ds = tf.keras.preprocessing.image_dataset_from_directory(
    "/images/",
    validation_split=0.2,
    subset="validation",
    seed=1024
)

Both functions return a tf.data.Dataset object. The argument validation_split lets you specify the percentage of data to reserve for validation (test in your case). In the example above I chose 80% train and 20% validation.

The seed argument must be the same for both train_ds and test_ds, because it ensures that the images are taken in same order, so you don't end up with mixed images in your train and test split.