scalaspotify-scio

How to run a Scio pipeline on Dataflow from SBT (local)


I am trying to run my first Scio pipeline on Dataflow .

The code in question can be found here. However I do not think that is too important.
My first experiment was to read some local CSV files and write another local CSV file, using the DirecRunner. That worked as expected.

Now, I am trying to read the files from GCS, write the output to BigQuery and run the pipeline using the DataflowRunner. I already made all the necessary changes (or that is what I believe). But I am unable to make it run.

I already gcloud auth application-default login and when I do

sbt run --runner=DataflowRunner --project=project-id --input-path=gs://path/to/data --output-table=dataset.table

I can see the Jb is submitted in Dataflow. However, after one hour the jobs fails with the following error message.

Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h.

(Note, the job did nothing in all that time, and since this is an experiment the data is simple too small to take more than a couple of minutes).

Checking the StackDriver I can find the follow error:

java.lang.ClassNotFoundException: scala.collection.Seq

Related to some jackson thing:

java.util.ServiceConfigurationError: com.fasterxml.jackson.databind.Module: Provider com.fasterxml.jackson.module.scala.DefaultScalaModule could not be instantiated

And that is what is killing each executor just at the start. I really do not understand why I can not find the Scala standard library.

I also tried to first create a template and runt it latter with:

sbt run --runner=DataflowRunner --project=project-id --input-path=gs://path/to/data --output-table=dataset.table --stagingLocation=gs://path/to/staging --templateLocation=gs://path/to/templates/template-1

But, after running the template, I get the same error.
Also, I noticed that in the staging folder there are a lot of jars, but the scala-library.jar is not in there.

I am missing something obvious?


Solution

  • It's a known issue with sbt 1.3.0 which introduced some breaking change w.r.t. class loaders. Try 1.2.8?

    Also the Jackson issue is probably related to Java 11 or above. Stay with Java 8 for now.