amazon-web-servicestensorflowamazon-ec2amazon-sagemakeramazon-fsx

How to input fsx for lustre to Amazon Sagemaker?


I am trying to set up Amazon sagemaker reading our dataset from our AWS Fsx for Lustre file system.

We are using the Sagemaker API, and previously we were reading our dataset from s3 which worked fine:

estimator = TensorFlow(
   entry_point='model_script.py',  
   image_uri='some-repo:some-tag', 
   instance_type='ml.m4.10xlarge',
   instance_count=1,
   role=role,
   framework_version='2.0.0',
   py_version='py3',
   subnets=["subnet-1"],
   security_group_ids=["sg-1", "sg-2"],
   debugger_hook_config=False,
  )
estimator.fit({
    'training': f"s3://bucket_name/data/{hyperparameters['dataset']}/"}
)

But now that I'm changing the input data source to Fsx Lustre file system, I'm getting an error that the file input should be s3:// or file://. I was following these docs (fsx lustre):

estimator = TensorFlow(
   entry_point='model_script.py',  
#    image_uri='some-docker:some-tag', 
   instance_type='ml.m4.10xlarge',
   instance_count=1,
   role=role,
   framework_version='2.0.0',
   py_version='py3',
   subnets=["subnet-1"],
   security_group_ids=["sg-1", "sg-2"],
   debugger_hook_config=False,
  )
fsx_data_folder = FileSystemInput(file_system_id='fs-1',
                                    file_system_type='FSxLustre',
                                    directory_path='/fsx/data',
                                    file_system_access_mode='ro')
estimator.fit(f"{fsx_data_folder}/{hyperparameters['dataset']}/")

Throws the following error:

ValueError: URI input <sagemaker.inputs.FileSystemInput object at 0x0000016A6C7F0788>/dataset_name/ must be a valid S3 or FILE URI: must start with "s3://" or "file://"

Does anyone understand what I am doing wrong? Thanks in advance!


Solution

  • I was (quite stupidly, it was late ;)) treating the FileSystemInput object as a string instead of an object. The error complained that the concatenation of obj+string is not a valid URI pointing to a location in s3.

    The correct way to do it is making a FileSystemInput object out of the entire path to the dataset. Note that the fit now takes this object, and will mount it to data_dir = "/opt/ml/input/data/training".

    fsx_data_obj = FileSystemInput(
        file_system_id='fs-1',
        file_system_type='FSxLustre',
        directory_path='/fsx/data/{dataset}',
        file_system_access_mode='ro'
    )
    estimator.fit(fsx_data_obj)