amazon-web-servicesxgboostamazon-sagemakerdistributed-training

How to use multiple instances with the SageMaker XGBoost built-in algorithm?


If we use multiple instances for training will the built-in algorithm automatically exploit it? For example, what if we used 2 instances for training using built-in XGBoost container and we used the same customer churn example? Will one instance be ignored?


Solution

  • Yes SageMaker XGBoost supports distributed training. If you set instance count > 1, SageMaker XGBoost will distribute the files from S3 to individual instances and perform distributed training. This, however, requires number of files on S3 >= number of instances. Otherwise, you will be charged for using two training instances without the benefit of using distributed training.

    You can find an example here

    https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_abalone_dist_script_mode.ipynb