I am working on a quite simplified vision_learner model from FastAI that uses a resnet34 as backbone. It should classify mushroom images, and all the data loading parts are done correctly, except when I get to split between training and testing datasets.
It seems like something goes wrong there, and when I get to fitting the model, it churns out this error message:
File /opt/conda/lib/python3.10/site-packages/fastai/data/transforms.py:263, in Categorize.encodes(self, o)
261 return TensorCategory(self.vocab.o2i[o])
262 except KeyError as e:
--> 263 raise KeyError(f"Label '{o}' was not included in the training dataset") from e
KeyError: "Label '755' was not included in the training dataset"
For context, this is how I load the data initially:
bs = 64
path = Path("../input/mushrooms/Mushrooms/")
fnames = []
for fpath in class_names:
print(path/f'{fpath}/')
fnames += get_image_files(path/f'{fpath}/')
....
../input/mushrooms/Mushrooms/Entoloma
../input/mushrooms/Mushrooms/Suillus
../input/mushrooms/Mushrooms/Hygrocybe
../input/mushrooms/Mushrooms/Agaricus
../input/mushrooms/Mushrooms/Amanita
../input/mushrooms/Mushrooms/Lactarius
../input/mushrooms/Mushrooms/Russula
../input/mushrooms/Mushrooms/Boletus
../input/mushrooms/Mushrooms/Cortinarius
And how I prepare it for the model:
np.random.seed(2)
pat = r"(\d+)_([a-zA-Z0-9-_]+)\.jpg$"
item_tfms = Resize(224) # Resizing each image to 224x224
batch_tfms = [*aug_transforms(), Normalize.from_stats(*imagenet_stats)] # Standard augmentations + normalization
data = ImageDataLoaders.from_name_re(
path='.',
fnames=fnames,
pat=pat,
item_tfms=item_tfms,
batch_tfms=batch_tfms,
bs=bs,
num_workers=0
)
train_dataset, test_dataset = data.train, data.valid
data.show_batch()
# Check classes
print(f"Number of classes: {len(data.vocab)}") # 2046
# Check datasets
print(f"Training dataset size: {len(data.train_ds)}") # 5372
print(f"Validation dataset size: {len(data.valid_ds)}") # 1342
And then I prepare the model
learn = vision_learner(data, models.resnet50, metrics=error_rate, lr=0.001)
learn.fit(n_epochs = 5, start_epoch=0)
Which ultimately leads to the issue above. Does someone have any hints as to what I'm doing wrong?
I decided to tackle this from another perspective, and it worked.
I used a DataBlock class from the fastai library directly, which allowed me to specify the actions I wanted to take on the data by myself.
dblock = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=Resize(224),
batch_tfms=aug_transforms()
)
dls = dblock.dataloaders(path, bs=64)
learn = vision_learner(dls, resnet34, lr=0.005, metrics=accuracy)
This resulted in a correct handling of the training and testing sets.