hadoophiveclouderacloudera-cdhsequencefile

Export file file hive to hdfs in Sequence File format


I am trying to executing a hive query, and export its output in HDFS with SEQUENCE FILE format.

beeline> show create table test_table;

+--------------------------------------------------------------------------------------+
|                                    createtab_stmt                                    |
+--------------------------------------------------------------------------------------+
| CREATE TABLE `test_table`(                                                           |
|   `XXXXXXXXXXXXXX` bigint,                                                           |
|   `XXXXXXXXXXXxx` int,                                                               |
|   `XXXXXXXXX` int,                                                                   |
|   `XXXXXX` int)                                                                      |
| PARTITIONED BY (                                                                     |
|   `XXXXXXXX` string,                                                                 |
|   `XXXX` string,                                                                     |
|   `XXXXXXXX` string)                                                                 |
| ROW FORMAT DELIMITED                                                                 |
|   FIELDS TERMINATED BY '\u00001'                                                     |
| STORED AS INPUTFORMAT                                                                |
|   'org.apache.hadoop.mapred.SequenceFileInputFormat'                                 |
| OUTPUTFORMAT                                                                         |
|   'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'                        |
| LOCATION                                                                             |
|   'hdfs://localhost:8020/user/hive/warehouse/local_hive_report.db/test_table'        |
| TBLPROPERTIES (                                                                      |
|   'transient_lastDdlTime'='1437569941')                                              |
+--------------------------------------------------------------------------------------+

Here is the query which I tried to export the data,

beeline> INSERT OVERWRITE DIRECTORY '/user/nages/load/date' 
    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY ',' 
    LINES TERMINATED BY '\n' 
    STORED AS SEQUENCEFILE 
    SELECT * FROM test_table WHERE column=value;

Here is the error,

    Error: Error while compiling statement: FAILED: ParseException line 1:61 
cannot recognize input near 'ROW' 'FORMAT' 'DELIMITED' in statement (state=42000,code=40000)

Am I missing something here?

Software version: Cloudera hadoop CDH5.3.3, Apache version 0.13.1.

Edit: Updated my temprorary solution below.


Solution

  • As a temporary fix, I have created a Hive table with sequence file format, and inserted selected records into it.

    CREATE TABLE temp_table
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'
    STORED AS SEQUENCEFILE
    AS 
    SELECT * FROM test_table WHERE column=value;
    

    This will create sequence file in the following location in HDFS.

    /<HIVE_DATABASE_ROOT>/temp_table.db/