I am trying to run training using Sagemaker's training jobs and the Sagemaker Python SDK, the training script relies on some custom libraries. From my understanding, because of the custom script, I need to generate a custom image using a docker container that's been registered to ECR (Elastic Container Registry). The environment below is a Sagemaker Studio Code Editor.
The error I get is Failed to parse hyperparameter
. See below for my set up and what I've tried as a solution.
Directory
working directory
—Dockerfile
—train.py
—requirements.txt
Dockerfile
# Use python image as base
FROM python:3.10
# Install system dependencies
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libpq-dev \
gcc \
&& rm -rf /var/lib/apt/lists/*
# Set working directory in container
COPY code /opt/program
WORKDIR /code
# Install Python dependencies
COPY requirements.txt /code/
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install sagemaker-training
# Copies the training code inside the container
COPY train.py /opt/ml/code/train.py
# Defines train.py as script entrypoint
ENV SAGEMAKER_PROGRAM train.py
# Set environment variables
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
requirements.txt
simpletransformers==0.70.0
pandas==2.1.1
numpy==1.26.0
torch==2.2.1
sklearn-deap==0.3.0
sklearn-genetic-opt==0.10.1
boto3==1.33.3
sagemaker
train.py
import argparse
import os
import logging
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, f1_score
from simpletransformers.classification import ClassificationModel
import torch
from sagemaker_pytorch_estimator.pytorch_estimator import PyTorchModel
from sagemaker_containers.data_instances.data_buffer import BufferDataset, BufferedShuffledDataset
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--batch_size", type=int, default=32)
parser.add_argument("--test_size", type=float, default=0.2)
parser.add_argument("--target_column", type=str, default="annotation")
parser.add_argument("--vertical", type=str, default="some_category")
parser.add_argument("--model_dir", type=str, default=os.environ.get("SM_MODEL_DIR"))
parser.add_argument("--train", type=str, default=os.environ.get("SM_CHANNEL_TRAIN"))
parser.add_argument("--val", type=str, default=os.environ.get("SM_CHANNEL_VAL"))
parser.add_argument("--test", type=str, default=os.environ.get("SM_CHANNEL_TEST"))
args, _ = parser.parse_known_args()
model_data = None
role = None
entry_point = None
....(script continues)
Launching script:
import sagemaker
from sagemaker.session import TrainingInput
from sagemaker.estimator import Estimator
vertical = 'some_category'
s3_bucket = 'some_bucker'
prefix = 'classification'
instance_type = 'ml.m4.xlarge'
print("Instance Type: {}".format(instance_type))
region = sagemaker.Session().boto_region_name
print("AWS Region: {}".format(region))
role = sagemaker.get_execution_role()
print("RoleArn: {}".format(role))
s3_output_location='s3://{}/{}/{}'.format(s3_bucket, prefix, 'classifier')
container = '############.###.###.##-####-#.amazonaws.com/some-name/ml-training:latest'
print("Image Container: {}".format(container))
estimator = Estimator(
image_uri=container,
role=role,
instance_count=1,
instance_type=instance_type,
volume_size=10,
output_path=s3_output_location,
sagemaker_session=sagemaker.Session()
)
estimator.set_hyperparameters(vertical=vertical,
s3_bucket=s3_bucket,
target_column='annotation',
test_size=0.2)
estimator.fit()
Error
Failed to parse hyperparameter
What I've tried as a solution:
sagemaker-training
library. The only suggestion there was to wrap the hyperparameters around a function some user suggested but that doesn't seem to be a current working solution (got this error: TypeError: Estimator.set_hyperparameters() takes 1 positional argument but 2 were given
)argparse
is not compatible with Sagemaker (all the official aws sagemaker documentation uses argparse
). Their suggested solution is unclear to me.There are several topics to address here. First of all, you don't need to create a container just to include additional dependencies.
You can add dependencies to an Estimator by providing source_dir
and including a requirements.txt file in the referenced source directory.
From the Estimator API documentation:
source_dir The absolute, relative, or S3 URI Path to a directory with any other training source code dependencies aside from the entry point file. If source_dir is an S3 URI, it must point to a tar.gz file. Structure within this directory is preserved when training on Amazon SageMaker.
The most straightforward way to include a source_dir
is to have it locally next to your notebook.
|----- example-notebook.ipynb
|----- src
|----- train.py
|----- requirements.txt
You can then configure your estimator to use the source directory with the following configuration:
estimator = Estimator(
[...]
entry_point="train.py",
source_dir="src",
[...]
)
If source_dir
is specified, then entry_point
must point to a file located at the root of source_dir
. The training job will automatically install dependencies from the provided requirements.txt
.
Since you're using Scikit-Learn, you could also use the SKLearn Estimator, which already bundles several dependencies and provides a simplified interface compared to the general Estimator.
If you'd like to use your code as is, then you could adapt your code as follows:
import json
# JSON encode hyperparameters
def json_encode_hyperparameters(hyperparameters):
return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}
hyperparameters = json_encode_hyperparameters({
"vertical": vertical,
"s3_bucket": s3_bucket,
"target_column": target_column,
"test_size": 0.2
})
estimator = Estimator(
image_uri=container,
role=role,
instance_count=1,
instance_type=instance_type,
volume_size=10,
output_path=s3_output_location,
sagemaker_session=sagemaker.Session(),
hyperparameters=hyperparameters
)
set_hyperparameters
expects the input as kwargs
, while the hyperparameters
property accepts the input in different formats. Therefore, you can't use the JSON-encoded dict with set_hyperparameters
; instead, use it with the hyperparameters
property.