I have a bare-bones Oozie coordinator with this specification:
<coordinator-app name="my-coord" frequency="${coord:days(1)}"
start="${startDate}" end="${endDate}" timezone="UTC"
xmlns="uri:oozie:coordinator:0.4">
<controls>
<timeout>${timeout}</timeout>
</controls>
<action>
<workflow>
<app-path>${workflow}</app-path>
</workflow>
</action>
</coordinator-app>
It launched the workflow job around the scheduled nominal start time. But later, logs showed the workflow job entered its fail-state. To retrieve job info, I ran:
oozie job -info 0000909-190113225141152-oozie-oozi-W
which supplied useful information, including the following exception trace:
] Launcher exception: org.apache.spark.SparkException: Application application_1547448533998_26676 finished with failed status
org.apache.oozie.action.hadoop.JavaMainException: org.apache.spark.SparkException: Application application_1547448533998_26676 finished with failed status
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:59)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:51)
at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:35)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:242)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.spark.SparkException: Application application_1547448533998_26676 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1169)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:56)
... 15 more
Unfortunately, this stack trace -- evidently produced from SparkSubmit
-- says nothing about why my workflow job (a Scala program) actually failed.
This seems like a common enough scenario -- workflow logic failing and triggering its own stack trace.
Is there some other place to look for such stack traces in the Hadoop / Oozie / Coordinator / Workflow setup?
Use yarn applications -list
to view a list of jobs running on the Hadoop cluster. Then follow these steps:
yarn logs -applicationId <application_ID>
.Resulting logs should show your Scala program logs interspersed with other logs not produced by the Scala program. It helps if your Scala program embeds a unique prefix in each logging directive so you can filter your program logs from others.