bashsqoopoozie

Sqoop works through bash, but doesn't work through oozie


I have a shell script

  sqoop import \
  -Dmapreduce.job.queuename=adhoc \
  --connect jdbc:oracle:thin:secret@//secret \
  --query "a select"  \
  --target-dir /apps/hive/warehouse/data.db/fair_usage \
  --delete-target-dir \
  -m 1 \
  --fields-terminated-by '\t' 

It works when I put it in sh file and run it. But when I try to run as a oozie action it fails. I tried using bash action and sqoop action both. This is Sqoop action. I also tried running sqoop with <command>import....</command> tag.

  <action name="export_table" cred="hv_cred">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
      <job-tracker>${JOB_TRACKER}</job-tracker>
      <name-node>${NAME_NODE}</name-node>
      <configuration>
        <property>
          <name>mapred.task.timeout</name>
          <value>600000</value>
        </property>
      </configuration>
        <arg>import</arg>
        <arg>-Dmapreduce.job.queuename=adhoc</arg>
        <arg>--connect</arg>
        <arg>jdbc:oracle:thin:secret@//secret</arg>
        <arg>--query</arg>
        <arg>"a select"</arg>
        <arg>--target-dir</arg>
        <arg>/apps/hive/warehouse/data.db/fair_usage</arg>
        <arg>--delete-target-dir</arg>
        <arg>-m</arg>
        <arg>1</arg>
        <arg>--fields-terminated-by</arg>
        <arg>'\t'</arg>
    </sqoop>
    <ok to="END"/>
    <error to="KILL"/>
  </action>

The error I get is Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1] for bash and [org.apache.oozie.action.hadoop.SqoopMain], exit code [1] for sqoop.

Which doesn't tell me anything. When I view the logs, I can't find anything useful. Stderr has barely 30 rows and no errors. Syslog is longer, but no errors either.

After some time something new appeared

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

Edit

I tried running the script through a shell action again. It worked. The sh hasn't changed so I probably made a mistake in the workflow file. I haven't saved the old version of it, so can't say what that mistake was.

Shell action

  <action name='export_table'>
    <shell xmlns="uri:oozie:shell-action:0.1">
      <job-tracker>${JOB_TRACKER}</job-tracker>
      <name-node>${NAME_NODE}</name-node>
      <configuration>
        <property>
          <name>mapred.job.queue.name</name>
          <value>${QUEUE_NAME}</value>
        </property>
      </configuration>
      <exec>bash/export_table.sh</exec>
      <file>bash/export_table.sh#export_table.sh</file>
    </shell>
    <ok to="END"/>
    <error to="KILL"/>
  </action>

Still don't know why sqoop action doesn't work

EDIT2

A few months passed and I take my words back. Fecking sqoop. Same error.


Solution

  • The problem had nothing to do with the workflow or the sqoop script itself, but rather some inner mechanics I'm not aware of. Missing libs maybe?

    When I run the script in terminal, a java class is generated. This class seems to describe how the hive table is parsed to oracle table or smth. When I start oozie I need to add this autogenerated file to the root directory(the one with coordinator and workflow). If I don't oozie fails. The damn thing won't even give any errors(

    Anyway, after adding the java class to root, I can run the sqoop from a shell action. Sqoop action still doesn't work. Another thing I noticed, the shell action will execute if in sqoop I use

    --export-dir /apps/hive/warehouse/db.db/table/
    

    But if I try

    --hcatalog-database db \
    --hcatalog-table table \
    

    in shell, it will fail anyway. It probably tries to add some libraries to class path and can't or smth.