imagecsvpytorchkaggleimportdata

How to load a .csv and Image dataset in kaggle?


I've been trying Binary Classification with PyTorch on the competition called SIIM-ISIC Melanoma Classification (https://www.kaggle.com/competitions/siim-isic-melanoma-classification) but I've had some problems on how to combine the images and labels. I've been trying to implement a class for loading and merging the images and their labels but for some reason the same error appeared every time I tried to run the line train[5], with 5 or another index:

FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/input/siim-isic-melanoma-classification/jpeg/train/ISIC_0074311'

I can assure everyone that the paths are correctly copied and that the images are in those folders.

The folders structure is: enter image description here

train.csv: image_name patient_id sex age_approx anatom_site_general_challenge diagnosis benign_malignant target (0 or 1)

train folder in jpeg folder: ISIC_0015719.jpg ISIC_0052212.jpg ISIC_0068279.jpg ISIC_0074268.jpg ...

My code:

import os
import pandas as pd
from PIL import Image
import torch
import torch.utils.data
from PIL import Image
class LoadDataset(torch.utils.data.Dataset):
    
    def __init__(self, csv_path, image_folder, transform = None):
        self.df           = pd.read_csv(csv_path)
        self.image_folder = image_folder
        self.transform    = transform

    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, index):
        filename = self.df.loc[index, 'image_name']
        label    = self.df.loc[index, 'target']
        image    = Image.open(os.path.join(self.image_folder, filename))

        if self.transform is not None:
            image = self.transform(image)
            
        return image, label
train = LoadDataset('/kaggle/input/images1-isic2020/train.csv', '/kaggle/input/images1-isic2020/train/train')
train[5]

Does anyone have any ideas on how to solve it or another option other than an image generator to obtain each image with its label?


Solution

  • The error suggests that filename is missing the ".jpg" extension. A simple fix is to add the extension manually if it's always the same format:

    filename = self.df.loc[index, 'image_name'] + ".jpg"