jupyter-notebookgoogle-colaboratorypydrive

Upload images with labels in Google Colab


I am using Jupiter Notebook in Google Colab. My training dataset looks like this:

/data/label1/img1.jpeg
.
.
.
/data/label2/img90.jpeg

I want to import such a dataset. Things that I tried

Step1:

!pip install -U -q PyDrive
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
from os import walk
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

Step 2:

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

Step 3

file_to_download = os.path.expanduser('./data/')
file_list = drive.ListFile(
    {'q': 'id_of_the_data_directory'})

Not sure how to proceed next. The folder data is my collab notebook folder in the drive. I want to read the images along with the labels.To do the same I am using the code:

filename_queue=tf.train.string_input_producer(tf.train.match_filenames_once('data/*/*.jpeg'))
image_reader=tf.WholeFileReader()
key,image_file=image_reader.read(filename_queue)
#key is the entire path to the jpeg file and we need only the subfolder as the label
S = tf.string_split([key],'\/')
length = tf.cast(S.dense_shape[1],tf.int32)
label = S.values[length-tf.constant(2,dtype=tf.int32)]
label = tf.string_to_number(label,out_type=tf.int32)
#decode the image
image=tf.image.decode_jpeg(image_file)
#then code to place labels and folders in corresponding arrays

Solution

  • First of all I want to mention that we cannot access the folder directly. We need to set the mount point and all the drive contents are accessed via that. Thanks to this answer Follow the steps exactly as given in the answer link given above. But just make sure to change your path according to the new drive folder created.

    PS: I still left the question open because you may reach here with image dataset having subfolder names as the labels of the training images, it works for so the solution posted here works for both directories with subfolders as well as directories with files.