Is there a way to convert a folder comprising .jpeg images to hdf5 in Python? I am trying to build a neural network model for classification of images. Thanks!
There are a lot of ways to process and save image data. Here are 2 variations of a method that reads all of the image files in 1 folder and loads into a HDF5 file. Outline of this process:
1ds_
)glob.iglob()
to loop over images. Then do:
cv2.imread()
cv2.resize()
img_ds[cnt:cnt+1:,:,:]
This is ONE way to do it. Additional things to consider:
with/as:
and loop that saves the data to the 2nd HDF5 (prefixed: nds_
).ppm
files, so you need to modify the glob functions to
use *.jpg
.Simpler Version Below (added Mar 16 2021):
Assumes all files are in the current folder, AND loads all resized images to one dataset (named 'images'). See previous code for the second method that loads each image in separate dataset without resizing.
import sys
import glob
import h5py
import cv2
IMG_WIDTH = 30
IMG_HEIGHT = 30
h5file = 'import_images.h5'
nfiles = len(glob.glob('./*.ppm'))
print(f'count of image files nfiles={nfiles}')
# resize all images and load into a single dataset
with h5py.File(h5file,'w') as h5f:
img_ds = h5f.create_dataset('images',shape=(nfiles, IMG_WIDTH, IMG_HEIGHT,3), dtype=int)
for cnt, ifile in enumerate(glob.iglob('./*.ppm')) :
img = cv2.imread(ifile, cv2.IMREAD_COLOR)
# or use cv2.IMREAD_GRAYSCALE, cv2.IMREAD_UNCHANGED
img_resize = cv2.resize( img, (IMG_WIDTH, IMG_HEIGHT) )
img_ds[cnt:cnt+1:,:,:] = img_resize
Previous Code Below (from Mar 15 2021):
import sys
import glob
import h5py
import cv2
IMG_WIDTH = 30
IMG_HEIGHT = 30
# Check command-line arguments
if len(sys.argv) != 3:
sys.exit("Usage: python load_images_to_hdf5.py data_directory model.h5")
print ('data_dir =', sys.argv[1])
data_dir = sys.argv[1]
print ('Save model to:', sys.argv[2])
h5file = sys.argv[2]
nfiles = len(glob.glob(data_dir + '/*.ppm'))
print(f'Reading dir: {data_dir}; nfiles={nfiles}')
# resize all images and load into a single dataset
with h5py.File('1ds_'+h5file,'w') as h5f:
img_ds = h5f.create_dataset('images',shape=(nfiles, IMG_WIDTH, IMG_HEIGHT,3), dtype=int)
for cnt, ifile in enumerate(glob.iglob(data_dir + '/*.ppm')) :
img = cv2.imread(ifile, cv2.IMREAD_COLOR)
# or use cv2.IMREAD_GRAYSCALE, cv2.IMREAD_UNCHANGED
img_resize = cv2.resize( img, (IMG_WIDTH, IMG_HEIGHT) )
img_ds[cnt:cnt+1:,:,:] = img_resize
# load each image into a separate dataset (image NOT resized)
with h5py.File('nds_'+h5file,'w') as h5f:
for cnt, ifile in enumerate(glob.iglob(data_dir + '/*.ppm')) :
img = cv2.imread(ifile, cv2.IMREAD_COLOR)
# or use cv2.IMREAD_GRAYSCALE, cv2.IMREAD_UNCHANGED
img_ds = h5f.create_dataset('images_'+f'{cnt+1:03}', data=img)