I have a Beam pipeline runs well locally with DirectRunner. However, when switching to the DataFlowRunner, the job started and I can see the flow chart from the Google dataflow web ui. However, the job does not run. It was hanging there till I stop the job. I am using Beam 2.10. I can see the auto scaling adjusting cpu and no exception in the log.
I think this has something to do with the way I create the Jar file. I am using the shadow Jar to create the jar file in gradle build. Main reason to use the ShadowJar is for mergeServiceFiles(). If not using mergeServiceFiles(), the job will run with exception like No FileSystem found for gs.
So I copied the word count from google dataflow template repo and package as jar file. It shows the same thing, the job started but not moving. The code has been touched with miniumum change for the service account credential. Instead of its original PipelineOptions, I extends the GcsOptions for the credential.
Tried beam 2.12, 2.10.
Dig around and found the full log by clicking on the stackdrive on the upper right corner of the log shown. Found the following
Caused by: java.lang.IllegalStateException: Detected both log4j-over-slf4j.jar AND bound slf4j-log4j12.jar on the class path, preempting StackOverflowError. See also http://www.slf4j.org/codes.html#log4jDelegationLoop for more details. at org.slf4j.impl.Log4jLoggerFactory.<clinit>(Log4jLoggerFactory.java:54) ....
Then there is a
java failed with exit status 1
log entry few rows under the log4j error. Basically the java program already stopped but the dataflow UI still showing it is running on the flow chart.
Use the gradle build script to exclude all the slf4j-log4j12 from
compile ('org.apache.hadoop:hadoop-mapreduce-client-core:3.2.0') {exclude group: 'org.slf4j', module: 'slf4j-log4j12'}
and other dependencies contains slf4j-log4j12 and the job start moving.