tensorflowkerasamazon-sagemakerimage-preprocessing

SageMaker Pipeline - Processing step for ImageClassification model


I'm trying to solve ImageClassification task. I have prepared a code to train, evaluate and deploy tensorflow model in SageMaker Notebook. I'm new with SageMaker and SageMaker Pipeline too. Currently, I'm trying to split my code and create SageMaker pipeline to solve Image Classification task. In reference to AWS documentation there is Processing steps. I have a code which read data from S3 and use ImageGenerator to generate augmented images on the fly while tensorflow model is still in the training stage.

I don't find anything of how I can use ImageGenerator inside of Processing step in SageMaker Pipeline.

My Code of ImageGenerator:

def load_data(mode):
    if mode == 'TRAIN':
        datagen = ImageDataGenerator(
            rescale=1. / 255,
            rotation_range = 0.5,
            shear_range=0.2,
            zoom_range=0.2,
            width_shift_range = 0.2,
            height_shift_range = 0.2,
            fill_mode = 'nearest',
            horizontal_flip=True)
    else:
        datagen = ImageDataGenerator(rescale=1. / 255)
    return datagen


def get_flow_from_directory(datagen,
                            data_dir,
                            batch_size,
                            shuffle=True):
    assert os.path.exists(data_dir), ("Unable to find images resources for input")
    generator = datagen.flow_from_directory(data_dir,
                                            class_mode = "categorical",
                                            target_size=(HEIGHT, WIDTH),
                                            batch_size=batch_size,
                                            shuffle=shuffle
                                            )
    print('Labels are: ', generator.class_indices)
    return generator

Question is - does it possible to use ImageGenerator inside of Processing step of SageMaker Pipeline? I'd appreciate for any ideas, Thanks.


Solution

  • So, ImageGenerator and flow_from_directory I continue use inside of Training step. Processing step I skip at all, just use Training, Evaluating and Register model.