pythondeploymentamazon-sagemakerhuggingfacemlops

Unable to deploy hugging face model to sagemaker endpoint - C:\\.sagemaker-code-config not found


I'm trying to make a sagemaker endpoint using sagemaker and hugging face libraries.

import sagemaker
sess = sagemaker.Session()
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

role = "my-IAM-role"

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

repository = "FremyCompany/BioLORD-2023-M"
model_id=repository.split("/")[-1]
s3_location=f"s3://{sess.default_bucket()}/custom_inference/{model_id}/model.tar.gz"

from sagemaker.huggingface.model import HuggingFaceModel


# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=s3_location,       # path to your model and script
   role = role,
   transformers_version="4.37.0",  # transformers version used
   pytorch_version="2.1.0",        # pytorch version used
   py_version="py310",            # python version used
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m4.xlarge",
   endpoint_name="bioLORD-test"
)

But when I execute the code it runs forever. When I interrupt the execution and check the logs above the KeyboardInterrupt error there is the following:

FileNotFoundError                         Traceback (most recent call last)
File ~\AppData\Local\Programs\Python\Python312\Lib\pathlib.py:860, in Path.exists(self, follow_symlinks)
    859 try:
--> 860     self.stat(follow_symlinks=follow_symlinks)
    861 except OSError as e:

File ~\AppData\Local\Programs\Python\Python312\Lib\pathlib.py:840, in Path.stat(self, follow_symlinks)
    836 """
    837 Return the result of the stat() system call on this path, like
    838 os.stat() does.
    839 """
--> 840 return os.stat(self, follow_symlinks=follow_symlinks)

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\.sagemaker-code-config'

During handling of the above exception, another exception occurred:

KeyboardInterrupt                         Traceback (most recent call last)

I've transferred from MacOS to Windows short time ago and this code use to run properly on Mac. I've tried to search for the 'sagemaker-code-config' but wasn't able to find anything useful. I also don't understand why the code runs forever instead of throwing the FileNotFoundError. Thanks for help!

P.S. I've tried to execute the same code in Ubuntu in WSL but got the same result.


Solution

  • The SageMaker Python SDK is using a function to identify a local .sagemaker-code-config config file which is used on Studio to add relevant project tags.

    The code recursively traverses the filesystem until it locates this file. You can view the implementation here. This is the relevant section:

    STUDIO_PROJECT_CONFIG = ".sagemaker-code-config"
    
    [...]
    
    try:
        wd = Path(working_dir) if working_dir else Path.cwd()
    
        path = None
        while path is None and not wd.match("/"):
            candidate = wd / STUDIO_PROJECT_CONFIG
            if Path.exists(candidate):
                path = candidate
            wd = wd.parent
    
        return path
    except Exception as e:
        [...]
    

    Usually, the code should continue if it cannot find the file, but interacting with your filesystem on Windows might cause a deadlock, possibly due to an anti-virus program or similar issue.

    The code path which uses _append_project_tags will be invoked on several occasions, for example when you deploy a model.

    I can't test it on Windows, but I'd recommend creating an empty .sagemaker-code-config file in the same working directory where your code is located. That should prevent the recursive path traversal.