[SOLVED] Sagemaker's SklearnModel requirements.txt not getting installed

Sagemaker's SklearnModel requirements.txt not getting installed

This is my code:

from sagemaker.sklearn import SKLearnModel


role = sagemaker.get_execution_role()

model = SKLearnModel(
    model_data= f"s3://{default_bucket}/{prefix}/model.tar.gz",  
    role=role,
    entry_point="inference.py",    
    framework_version="1.2-1",      
    py_version="py3",
)

predictor = model.deploy(
    instance_type="ml.c5.large",
    initial_instance_count=1,
    container_startup_health_check_timeout=180
)

s3://{default_bucket}/{prefix}/model.tar.gz contains:

Contents of the tarball: (I also tried putting requirements.txt in code/ as advised on documentation for pytorch models on tarball structure)
?rw-r--r-- sagemaker-user/users    4349839 2024-11-29 19:22:21 model.pkl 
?rw-r--r-- sagemaker-user/users         24 2024-12-02 14:43:26 inference.py 
?rw-r--r-- sagemaker-user/users      44212 2024-11-29 19:23:17 explainer 
?rw-r--r-- sagemaker-user/users         24 2024-12-02 14:43:26 requirements.txt

requirements.txt contains:

dill
pandas
joblib

(and I even know pandas is installed by default, checking the aws code)

When I try to deploy I get an error because inference.py has on the first line import dill saying the module is not found and in Cloudwatch I see only inference 1.0.0 is installed (which I assume is my script).

I know I could probably create a subprocess within inference.py and call pip there, but I want to do this properly.

Solution

You are including the requirements.txt and inference.py in the model.tar.gz file, but please try to put those files under "scripts" (or whatever you want to use), and specify the path as the "source_dir" argument of the SKLearnModel. Dependencies in requirements.txt file should be automatically installed.

For an example how to use source_dir and requirements.txt, please refer to this sample.

model = SKLearnModel(
    role=role,
    model_data=model_data,
    framework_version="1.2-1",
    py_version="py3",
    source_dir="code",
    entry_point="inference.py",
)

Under the "code/" directory, you can find requirements.txt.