I am struggling with training a CNN model to identify dogbreeds. I intend to train the Stanford Dogs Dataset using ResNet architecture. I downloaded the dataset from http://vision.stanford.edu/aditya86/ImageNetDogs/ into google-colab notebook and have extracted the images in the dataset. I get a folder structure like this: folder_structure. I know I need the folder structure which has subfolders train and test and then further subfolders with images of dogs with corresponding species. How do I go along doing that?
You don't need to strictly create separate folders for train and test. You can use the method tf.keras.utils.image_dataset_from_directory
from tensorflow. It lets you load your all-in-one-folder dataset taking the right split while loading. This is how:
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/images/", # path to your data folder
validation_split=0.2, # percentage reserved for test
subset="training", # this dataset is for training
seed=1024 # must be the same for both train and test: ensures that you take the images consistently
)
test_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/images/",
validation_split=0.2,
subset="validation",
seed=1024
)
Both functions return a tf.data.Dataset
object. The argument validation_split
lets you specify the percentage of data to reserve for validation (test in your case). In the example above I chose 80% train and 20% validation.
The seed
argument must be the same for both train_ds
and test_ds
, because it ensures that the images are taken in same order, so you don't end up with mixed images in your train and test split.