[SOLVED] Error while tensorflow training on gcloud ml engine

Error while tensorflow training on gcloud ml engine

I am following this ml-engine guide. I did setup my gcloud and created vm also. For tensorflow, I am using Anaconda 3 to create my python environment. I created new environment with python=3.6. But when I fire this

gcloud ml-engine local train --module-name trainer.task --package-path trainer -- --train-files c:\Anaconda3\mytensorflowcode\cloudml-samples-master\census\estimator\data\adult.data.csv --eval-files c:\Anaconda3\mytensorflowcode\cloudml-samples-master\census\estimator\data\adult.test.csv --train-steps 1000 --job-dir c:\Anaconda3\mytensorflowcode\cloudml-samples-master\census\estimator\output --eval-steps 100

I am getting following error

Traceback (most recent call last):
File "D:\gcsdk174\google-cloud-sdk\platform\bundledpython\lib\runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "D:\gcsdk174\google-cloud-sdk\platform\bundledpython\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\Anaconda3\mytensorflowcode\cloudml-samples-master\census\estimator\trainer\task.py", line 4, in <module>
import model
File "trainer\model.py", line 20, in <module>
import tensorflow as tf
ImportError: No module named tensorflow

I could able to install tensorflow successfully with pip install -r ../requirements.txt command as per the guide.

Can anybody point out, what I am doing wrong?

Solution

Update: this issue should now be fixed with the most recent version of gcloud. Can you give it a try and see if it works for you? First do:
gcloud components update

What's happening is that gcloud is (silently) requiring py2.7, which is causing your import error. This is a bug that we will fix soon. (It's particularly problematic for Windows, since TF doesn't support a 2.7 install for windows). We'll update here when it's fixed.

In the meantime, the best option is probably to test locally by just running your python script directly (unless you are trying to test distributed training locally).

If you are trying to test distributed training locally, then your best temporary option is probably to use Docker and the TensorFlow docker container.