sqoop

Sqoop: Force Sqoop to make the target directory


Still a newbie to the whole Hadoop system. As the title implies, is there a way to make Sqoop create the target directory during the import? Or does the target dir always have to exist before sending it to HDFS.

Thank you.


Solution

  • If the target directory does not exist, sqoop creates a target directory as per path specified in your sqoop command

    --target-dir <dir>
    

    If you already have a directory and if you try to run the sqoop command, it fails and to avoid that you have to specify the following option and it will delete the directory if exist.

    --delete-target-dir
    

    an example from cloudera vm box which has default mysql with sample data

    sqoop import \
    --connect jdbc:mysql://localhost:3306/retail_db \
    --username root \
    --password cloudera \
    --target-dir /user/cloudera/sqoop_import/orders \
    --delete-target-dir \
    --num-mappers 2 \
     --query "select * from orders " \
    --split-by order_id
    

    alternatively, you can also speficy the parent directory and all the table folder will be created using table name. In this case --target-dir is incompatible with --warehouse-dir

    sqoop import \
    --connect jdbc:mysql://localhost:3306/retail_db \
    --username root \
    --password cloudera \
    --table orders \
    --warehouse-dir /user/cloudera/sqoop_import/