scalaapache-sparksqoopspark-shellscala-shell

Executing Linux Command in Scala-Shell


I'm working on a project where I'm needing to execute some linux commands (sqoop command) in my Scala application. See sample command I tried executing with MySql on my VM.

import sys.process._ 

"sqoop eval --connect jdbc:mysql://localhost:3306/retail_db --username root --password cloudera --query 'select * from categories'".!

I got the following error:

Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/06/24 15:25:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
20/06/24 15:25:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. 
Consider using -P instead.
20/06/24 15:25:27 ERROR tool.BaseSqoopTool: Error parsing arguments for eval:
20/06/24 15:25:27 ERROR tool.BaseSqoopTool: Unrecognized argument: *
20/06/24 15:25:27 ERROR tool.BaseSqoopTool: Unrecognized argument: from
20/06/24 15:25:27 ERROR tool.BaseSqoopTool: Unrecognized argument: categories

I used this command as well and I got same error message:

"sqoop eval --connect jdbc:mysql://localhost:3306/retail_db --username root --password cloudera --query 'select * from categories'".!<

Can someone help me figure out what's cause of the error. I've tried using single quote and double quotes, all to no avail. I searched all over SO but I could not get any solution. That's why I'm posting here. NOTE: Same command successfully executed in pyspark as seen below:

>>> import os
>>> import sys

>>> query = "sqoop eval --connect jdbc:mysql://localhost:3306/retail_db --username root --password 
cloudera --query 'select * from categories'" 
>>> os.system(query)
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/06/24 15:28:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
20/06/24 15:28:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. 
Consider using -P instead.
20/06/24 15:28:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
----------------------------------------------------
| category_id | category_department_id | category_name        | 
----------------------------------------------------
| 1           | 2           | Football             | 
| 2           | 2           | Soccer               | 
| 3           | 2           | Baseball & Softball  | 
| 4           | 2           | Basketball           | 
| 5           | 2           | Lacrosse             | 
| 6           | 2           | Tennis & Racquet     | 




 

Solution

  • It looks like sqoop doesn't recognize *, from, and categories as individual arguments. The reason it works when invoked from the command line is that the shell interprets the quote marks and presents them as a single select * from categories argument. In other words, the shell does some pre-processing before handing everything off to the sqoop program.

    The .! method (i.e. the Scala ProcessBuilder) launches processes directly, which means that the command elements are not passed to a shell for pre-processing. There are two ways to get around this problem.

    1. You can invoke the shell directly and pass the command-line to it as a single argument, or
    2. you can do most of the obvious pre-processing yourself.

    Here's an example of the 2nd option.

    Seq("sqoop"
       ,"eval"
       ,"--connect"
       ,"jdbc:mysql://localhost:3306/retail_db"
       ,"--username"
       ,"root"
       ,"--password"
       ,"cloudera"
       ,"--query"
       ,"select * from categories").!
    

    As you can see, all the individual arguments are presented as individual arguments, including the last one.