amazon-web-serviceshuggingface-transformersamazon-sagemakerhuggingfaceaws-batch

How do I detach the HuggingFace SageMaker training?


I am training a HuggingFace model remotely on SageMaker integration. My training job takes more than two hours, and I would like to shut my computer off during training.

I use the following snippet to train the model:

huggingface_estimator.fit(
  {
    'train': 's3://sagemaker-us-east-1-558105141721/samples/datasets/imdb/train',
    'test': 's3://sagemaker-us-east-1-558105141721/samples/datasets/imdb/test'
  }
)

How can I set up the trainer so that it the training process runs in the background so I can shut my computer down?


Solution

  • How can I set up the trainer so that it the training process runs in the background so I can shut my computer down?

    You can!

    Calling fit on a Hugging Face Estimator starts the training job. Once a training job is started on SageMaker, it runs independently of your local machine.

    You can shutdown your machine & the job will continue to run on SageMaker until it’s finished.