I'd like to run a Tensorflow application using multiple GPU's on Cloud ML.
My Tensorflow application is written in the non-distributed paradigm, that is outlined here
From what I understand if I want to use Cloud ML to run this same application, with multiple GPU's then the application must use scale tier CUSTOM and I need to set up parameter servers, worker servers which seem to be a distributed-tensorflow paradigm. Link here
Is this the only way to run multiple GPU training jobs on Cloud ML?
Is there a guide that helps me scope the changes required for my multiGPU (tower based) training application to a distributed tensorflow application?
You can use CUSTOM tier with only a single master node, and no workers/parameter servers. Those are optional parameters.
Then complex_model_m_gpu
has 4 GPUs, and complex_model_l_gpu
has 8.