google-cloud-dataflowgoogle-cloud-dataprep

Dataflow Workers unable to connect to Dataflow Service


I am using Google Dataprep to start Dataflow jobs and am facing some difficulties.

For background, we used Dataprep for some weeks and it worked without problem before we started to have authorization issues with the service account. When we finally solved this, we restarted the jobs we used to launch but they failed with "The Dataflow appears to be stuck.".

We tried with another very simple job but we met the same error. Here are the full error messages, the job fails after one hour being stuck:

Dataflow -

(1ff58651b9d6bab2): Workflow failed. Causes: (1ff58651b9d6b915): The Dataflow appears to be stuck.

Dataprep -

The Dataflow job (ID: 2017-11-15_00_23_23-9997011066491247322) failed. Please 
contact Support and provide the Dataprep Job ID 20825 and the Dataflow Job ID.

It seems this kind of error has various origins and I have no clue about where to start. Thanks in advance


Solution

  • Please check if there have been any changes to your project's default network. This is the common reason for workers not being able to contact the service, causing 1 hour timeouts.

    Update:

    After looking into further, <project-number>-compute@developer.gserviceaccount.com service account for Compute Engine is missing under 'Editor' role. This is usually automatically created. Probably this was removed later by mistake. See 'Compute Engine Service Account' section in https://cloud.google.com/dataflow/security-and-permissions.

    We are working on fixes to improve early detection of such missing permissions so that the failure points the root cause better.

    This implies your other Dataflow jobs fail similarly as well.