I am trying to executing a hive query, and export its output in HDFS with SEQUENCE FILE format.
beeline> show create table test_table;
+--------------------------------------------------------------------------------------+
| createtab_stmt |
+--------------------------------------------------------------------------------------+
| CREATE TABLE `test_table`( |
| `XXXXXXXXXXXXXX` bigint, |
| `XXXXXXXXXXXxx` int, |
| `XXXXXXXXX` int, |
| `XXXXXX` int) |
| PARTITIONED BY ( |
| `XXXXXXXX` string, |
| `XXXX` string, |
| `XXXXXXXX` string) |
| ROW FORMAT DELIMITED |
| FIELDS TERMINATED BY '\u00001' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.SequenceFileInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' |
| LOCATION |
| 'hdfs://localhost:8020/user/hive/warehouse/local_hive_report.db/test_table' |
| TBLPROPERTIES ( |
| 'transient_lastDdlTime'='1437569941') |
+--------------------------------------------------------------------------------------+
Here is the query which I tried to export the data,
beeline> INSERT OVERWRITE DIRECTORY '/user/nages/load/date'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS SEQUENCEFILE
SELECT * FROM test_table WHERE column=value;
Here is the error,
Error: Error while compiling statement: FAILED: ParseException line 1:61
cannot recognize input near 'ROW' 'FORMAT' 'DELIMITED' in statement (state=42000,code=40000)
Am I missing something here?
Software version: Cloudera hadoop CDH5.3.3, Apache version 0.13.1.
Edit: Updated my temprorary solution below.
As a temporary fix, I have created a Hive table with sequence file format, and inserted selected records into it.
CREATE TABLE temp_table
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS SEQUENCEFILE
AS
SELECT * FROM test_table WHERE column=value;
This will create sequence file in the following location in HDFS.
/<HIVE_DATABASE_ROOT>/temp_table.db/