[SOLVED] how to save uncompressed outputs from a training job in using aws Sagemaker python SDK?

how to save uncompressed outputs from a training job in using aws Sagemaker python SDK?

I'm trying to upload training job artifacts to S3 in a non-compressed manner.

I am familiar with the output_dir one can provide to a sagemaker Estimator, then everything saved under /opt/ml/output is uploaded compressed to the S3 output dir.

I want to have the option to access a specific artifact without having to decompress the output every time. Is there a clean way to go about it? if not any workaround in mind? The artifacts of my interest are small meta-data files .txt or .csv, while in my case the rest of the artifacts can be ~1GB so downloading and decompressing is quite excessive.

any help would be appreciated

Solution

You can specify parameter disable_output_compression=True when specifying your Estimator (details in docs here). Then all your outputs will be saved in output_dir uncompressed.

Example:

import sagemaker
from sagemaker.estimator import Estimator

estimator = Estimator(
    image_uri="your-own-image-uri",
    role=sagemaker.get_execution_role(), 
    sagemaker_session=sagemaker.Session(),
    instance_count=1,
    instance_type='ml.c4.xlarge',
    disable_output_compression=True
)