tensorflowgoogle-cloud-mlgoogle-cloud-ml-engine

Does Google Cloud ML only support distributed Tensorflow for Multiple GPU training jobs?


I'd like to run a Tensorflow application using multiple GPU's on Cloud ML.

My Tensorflow application is written in the non-distributed paradigm, that is outlined here

From what I understand if I want to use Cloud ML to run this same application, with multiple GPU's then the application must use scale tier CUSTOM and I need to set up parameter servers, worker servers which seem to be a distributed-tensorflow paradigm. Link here

Is this the only way to run multiple GPU training jobs on Cloud ML?

Is there a guide that helps me scope the changes required for my multiGPU (tower based) training application to a distributed tensorflow application?


Solution

  • You can use CUSTOM tier with only a single master node, and no workers/parameter servers. Those are optional parameters.

    Then complex_model_m_gpu has 4 GPUs, and complex_model_l_gpu has 8.