google-cloud-platformgoogle-dl-platform

Unable to create GCP Deep Learning VM instance with GPU


I'm trying to get a GCP "Deep Learning VM" instance running with a GPU. Following these instructions. I'm being hit with a You've gone over GPUs (all regions) quota by 1 GPU. Please increase your quota in the quotas page. Learn more. However when I look at the quota's I do have a 1GPU limit for "NVIDIA v100". I have a 0 limit for the Committed NVIDIA ***.

quota

When you create a "Deep Learning VM" instance and select GPUS, are you selecting committed GPUS?


Solution

  • When you request a GPU quota, you must request a quota for the GPU models that you want to create in each region, and an additional global quota for the total number of GPUs of all types in all zones. You can request to increase GPU quota from here.

    Since you already have NVIDIA_V100_GPUS quota limit of 1 in (for example us-west1) region, all you need to do now is to request for GPUs(All regions) quota increase through your Quotas page. The value of the request depends on the number of GPUS that you want to deploy. This should get rid of the error that you are getting.

    If you want to use committed GPUs then you need to create reservation based on your GPU types when purchasing the commitment. So, when you create a Deep Learning VM it should matched with your committed GPU types in order to use for the machine. For example, if you want to reserve 4 V100 GPUs, then you must also commit to 4 V100 GPUs and when you are creating the Deep Learning VM using one of the V100 GPU on the reservation section you can that 1 V100 GPU is being used. If you choose another GPU types then it will not selected from committed GPUs. Committed GPUS are only used to get discounts for using GPU resources.