google-cloud-dataflow

DataflowRunner pipeline error - Unable to rename


My DataFlow job reads one CSV file from GS bucket, query another service for extra data and writing it to a new CSV file and storing back to the bucket but it seems to fall before it grabs the input CSV file at the start...

This is the error I get: DataflowRuntimeException - Dataflow pipeline failed. State: FAILED, Error: Unable to rename "gs://../../job.1582402027.233469/dax-tmp-2020-02-22_12_07_49-5033316469851820576-S04-0-1719661b275ca435/tmp-1719661b275ca2ea-shard--try-273280d77b2c5b79-endshard.avro" to "gs://../../temp/job.1582402027.233469/tmp-1719661b275ca2ea-00000-of-00001.avro".

Any ideas what is the cause for this error?

here is a print screen


Solution

  • Usually that error is due to the fact that the service account you are using in the DataFlow jobs does not have the right GCS (Google Cloud Storage) permissions.

    You should add a role like "roles/storage.objectAdmin" to the service account to allow the interaction with GCS.