I'm currently trying to work on a small image machine learning project. I found this person's Kaggle code and I tried replicating it from scratch. However, not even in the main part, I already faced an error.
I'm sure there must be a localization issue on my end on how this ended up but I can't figure what.
My code:
#Import Libraries
#Data processing modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import cv2
#File directory modules
import glob as gb
import os
#Training and testing (machine learning) modules
import tensorflow as tf
import keras
#Importing the images into the code
trainDataset = 'melanoma_cancer_dataset/train'
testDataset = 'melanoma_cancer_dataset/test'
predictionDataset = 'melanoma_cancer_dataset/skinTest'
#creating empty lists for the images to fall into for processing
training_List = []
testing_list = []
#making a classification dictionary for the two keys, benign and malignant
#used for inserting into the images
diction = {'benign' : 0, 'malignant' : 1}
#Read through the folder's length contents
for folder in os.listdir(trainDataset):
data = gb.glob(pathname=str(trainDataset + folder + '/*.jpg'))
print(f'{len(data)} in folder {folder}')
#read the images, resize them in a uniform order, and store them in the empty lists
for data in data:
image = cv2.imread(data)
imageList = cv2.resize(image(120,120))
training_List.append(list(imageList))
The output of the notebook showed that it had 0 images/contents stored in the folder. Now I'm kinda doubting what's happening here and would love some answers. Thanks in advance. I'm using my own VScode too.
This is a screenshot of my files:
Based on your folder structure and the code you have provided, the issue is that you haven't put the trailing slash at the end of your folder paths. In the provided code, you're trying to concatenate the folder name directly with the path. However, if you miss a slash or if the folder variable does not include a trailing slash, this could result in an incorrect path.
Update the paths like this:
trainDataset = 'melanoma_cancer_dataset/train/'
testDataset = 'melanoma_cancer_dataset/test/'
predictionDataset = 'melanoma_cancer_dataset/skinTest/'
What your code is doing is here:
for folder in os.listdir(trainDataset):
data = gb.glob(pathname=str(trainDataset + folder + '/*.jpg'))
is that it is going to the path of the trainDataset, and then listing the folders there (which are named malignant and benign) with the use of os.listdir()
.
These paths are concatenated to generate the final image paths with:
data = gb.glob(pathname=str(trainDataset + folder + '/*.jpg'))
Also, slight syntax error in the line:
imageList = cv2.resize(image(120,120))
It should be
cv2.resize(image, (120, 120))
Also the way you are appending to training_List might be wrong. You need to convert the imageList to a list before appending it or append imageList directly if you want to keep the image array structure.
Full updated code:
# Data processing modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import cv2
# File directory modules
import glob as gb
import os
# Training and testing (machine learning) modules
import tensorflow as tf
import keras
# Directories
trainDataset = 'melanoma_cancer_dataset/train/'
testDataset = 'melanoma_cancer_dataset/test/'
predictionDataset = 'melanoma_cancer_dataset/skinTest/'
# Empty list for the images
training_List = []
testing_list = []
# Classification dictionary
diction = {'benign': 0, 'malignant': 1}
# Read through the folder's contents
for folder in os.listdir(trainDataset):
# Corrected the path pattern and added a slash
data = gb.glob(pathname=str(trainDataset + folder + '/*.jpg'))
print(f'{len(data)} in folder {folder}')
# Read the images, resize them, and store them in the list
for file_path in data:
image = cv2.imread(file_path)
# Corrected the resize function call
imageList = cv2.resize(image, (120, 120))
# Append the image array directly
training_List.append(imageList)
print(f'Total images in training set: {len(training_List)}')