Right now, I'm working on my master's thesis and I need to train a huge Transformer model on GCP. And the fastest way to train deep learning models is to use GPU. So, I was wondering which GPU should I use among the ones provided by GCP? The ones available at the current moment are:
It all depends on what are the characteristics you're looking for.
First, let's collect some information about these different GPU models and see which one suits you best. You can use this link to track the GPUs performance, and this link to check the pricing of older GPU cores, and this link for the accelerator-optimized ones.
I did that and I created the following table
Model | FP32 (TFLOPS) | Price/hour | TFLOPS/dollar |
---|---|---|---|
Nvidia H100 † | 67 | 11.06125 | 6.0571816 |
Nvidia L4 ‡ | 30.3 | 1.000416 | 30.28740044 |
Nvidia A100 ‡ | 19.5 | 3.673477 | 5.308322333 |
Nvidia Tesla T4 | 8.1 | 0.35 | 23.14285714 |
Nvidia Tesla P4 | 5.5 | 0.6 | 9.166666667 |
Nvidia Tesla V100 | 14 | 2.48 | 5.64516129 |
Nvidia Tesla P100 | 9.3 | 1.46 | 6.369863014 |
† The mimimum amount of GPUs to be used is 8.
‡ price includes 1 GPU + 12 vCPU + default memory.
In the previous table, you see can the:
FP32
: which stands for 32-bit floating point which is a measure of how fast this GPU card with single-precision floating-point operations. It's measured in TFLOPS or *Tera Floating-Point Operations... The higher, the better.Price
: Hourly-price on GCP.TFLOPS/Price
: simply how much operations you will get for one dollar.From this table, you can see:
Nvidia H100
is the fastest.Nvidia Tesla P4
is the slowest.Nvidia A100
is the most expensive.Nvidia Tesla T4
is the cheapest.Nvidia Tesla L4
has the highest operations per dollar.Nvidia Tesla A100
has the lowest operations per dollar.Nvidia K80
went out-of-support as of May 1 2024.And you can observe that clearly in the following figure:
[]
I hope that was helpful!