tensorflowdeep-learningneural-networkconv-neural-networkimagenet

Dogbreed classification/CNN


I am struggling with training a CNN model to identify dogbreeds. I intend to train the Stanford Dogs Dataset using ResNet architecture. I downloaded the dataset from http://vision.stanford.edu/aditya86/ImageNetDogs/ into google-colab notebook and have extracted the images in the dataset. I get a folder structure like this: folder_structure. I know I need the folder structure which has subfolders train and test and then further subfolders with images of dogs with corresponding species. How do I go along doing that?


Solution

  • You don't need to strictly create separate folders for train and test. You can use the method tf.keras.utils.image_dataset_from_directory from tensorflow. It lets you load your all-in-one-folder dataset taking the right split while loading. This is how:

    train_ds = tf.keras.preprocessing.image_dataset_from_directory(
        "/images/",  # path to your data folder
        validation_split=0.2,  # percentage reserved for test
        subset="training",  # this dataset is for training
        seed=1024  # must be the same for both train and test: ensures that you take the images consistently 
    )
    test_ds = tf.keras.preprocessing.image_dataset_from_directory(
        "/images/",
        validation_split=0.2,
        subset="validation",
        seed=1024
    )
    

    Both functions return a tf.data.Dataset object. The argument validation_split lets you specify the percentage of data to reserve for validation (test in your case). In the example above I chose 80% train and 20% validation.

    The seed argument must be the same for both train_ds and test_ds, because it ensures that the images are taken in same order, so you don't end up with mixed images in your train and test split.