[SOLVED] How to load a .csv and Image dataset in kaggle?

How to load a .csv and Image dataset in kaggle?

I've been trying Binary Classification with PyTorch on the competition called SIIM-ISIC Melanoma Classification (https://www.kaggle.com/competitions/siim-isic-melanoma-classification) but I've had some problems on how to combine the images and labels. I've been trying to implement a class for loading and merging the images and their labels but for some reason the same error appeared every time I tried to run the line train[5], with 5 or another index:

FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/input/siim-isic-melanoma-classification/jpeg/train/ISIC_0074311'

I can assure everyone that the paths are correctly copied and that the images are in those folders.

The folders structure is: enter image description here

train.csv: image_name patient_id sex age_approx anatom_site_general_challenge diagnosis benign_malignant target (0 or 1)

train folder in jpeg folder: ISIC_0015719.jpg ISIC_0052212.jpg ISIC_0068279.jpg ISIC_0074268.jpg ...

My code:

import os
import pandas as pd
from PIL import Image
import torch
import torch.utils.data
from PIL import Image

class LoadDataset(torch.utils.data.Dataset):
    
    def __init__(self, csv_path, image_folder, transform = None):
        self.df           = pd.read_csv(csv_path)
        self.image_folder = image_folder
        self.transform    = transform

    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, index):
        filename = self.df.loc[index, 'image_name']
        label    = self.df.loc[index, 'target']
        image    = Image.open(os.path.join(self.image_folder, filename))

        if self.transform is not None:
            image = self.transform(image)
            
        return image, label

train = LoadDataset('/kaggle/input/images1-isic2020/train.csv', '/kaggle/input/images1-isic2020/train/train')

train[5]

Does anyone have any ideas on how to solve it or another option other than an image generator to obtain each image with its label?

Solution

The error suggests that filename is missing the ".jpg" extension. A simple fix is to add the extension manually if it's always the same format:

filename = self.df.loc[index, 'image_name'] + ".jpg"