amazon-web-servicesamazon-sagemakernvme

Can SageMaker Training have training data in NVMe volumes on compatible instances?


Can SageMaker Training have training data in NVMe volumes on compatible instances? (eg G4dn and P3dn). If so, if there an appropriate way to programmatically access that data?


Solution

  • Yes on all nitro-backed instances EBS volumes that are exposed as NVMe block devices.

    In the Sagemaker Python SDK, you can specify the volume_size of the SM_TRAINING_CHANNEL path - the EBS (NVMe backed) will be in that path and when you go to actually run you pass the --train_dir path to your code.

    Code example below:

    def main(aws_region,s3_location,instance_cout):
        estimator = TensorFlow(
            train_instance_type='ml.p3.16xlarge',
                **train_volume_size=200,**
            train_instance_count=int(instance_count),
            framework_version='2.2',
                py_version='py3',
            image_name="231748552833.dkr.ecr.%s.amazonaws.com/sage-py3-tf-hvd:latest"%aws_region,
    

    And then in your entry script

    train_dir = os.environ.get('SM_CHANNEL_TRAIN')
    subprocess.call(['python','-W ignore', 'deep-learning-models/legacy/models/resnet/tensorflow2/train_tf2_resnet.py', \
                "--data_dir=%s"%train_dir, \