hadoophivecompressionlz4orc

Hive compression in ORC using Lz4


I am trying to compress RC and ORC file using LZ4. I have installed Hadoop-2.7.1 and Hive-1.2.1. In case of LZ4, I can compress RC file without any problem. But, when I try to load data in ORC file using LZ4, it is not working. I have created ORC table like below:

CREATE TABLE FINANCE_orc(
    PERMNO STRING,
    DATE STRING,
    CUSIP STRING,
    NCUSIP STRING,
    COMNAM STRING,
    TICKET STRING,
    PERMCO STRING,
    SHRCD STRING,
    EXCHCD STRING,
    HEXCD STRING,
    SICCD STRING,
    HSLCCD STRING,
    PRC STRING,
    VOL STRING,
    RET STRING,
    SHROUT STRING,
    DLRET STRING,
    VWRETD STRING,
    EWRETD STRING,
    SPRTRN STRING)
STORED AS ORC tblproperties ("orc.compress"="Lz4");

set mapred.output.compress=true; 
set hive.exec.compress.output=true; 
set mapred.output.compression.type = BLOCK;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; 
set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec; 

INSERT OVERWRITE table finance_orc select * from finance; 

But at the time of loading data it gives the following error:

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"permno":"PERMNO","ndate":"DATE","cusip":"CUSIP","ncusip":"NCUSIP","comnam":"COMNAM","ticket":"TICKER","permco":"PERMCO","shrcd":"SHRCD","exchcd":"EXCHCD","hexcd":"HEXCD","siccd":"SICCD","hslccd":"HSICCD","prc":"PRC","vol":"VOL","ret":"RET","shrout":"SHROUT","dlret":"DLRET","vwretd":"VWRETD","ewretd":"EWRETD","sprtrn":"SPRTRN"}
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"permno":"PERMNO","ndate":"DATE","cusip":"CUSIP","ncusip":"NCUSIP","comnam":"COMNAM","ticket":"TICKER","permco":"PERMCO","shrcd":"SHRCD","exchcd":"EXCHCD","hexcd":"HEXCD","siccd":"SICCD","hslccd":"HSICCD","prc":"PRC","vol":"VOL","ret":"RET","shrout":"SHROUT","dlret":"DLRET","vwretd":"VWRETD","ewretd":"EWRETD","sprtrn":"SPRTRN"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
    ... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:577)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:675)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
    at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
    ... 9 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4
    at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:622)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:566)
    ... 16 more
Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4
    at java.lang.Enum.valueOf(Enum.java:236)
    at org.apache.hadoop.hive.ql.io.orc.CompressionKind.valueOf(CompressionKind.java:25)
    at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getOptions(OrcOutputFormat.java:143)
    at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getHiveRecordWriter(OrcOutputFormat.java:203)
    at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getHiveRecordWriter(OrcOutputFormat.java:52)
    at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261)
    at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246)
    ... 18 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 4   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

I have used Snappy and Zlib with the same command and it is working fine. But problem is only with LZ4. I do not know the reason why?


Solution

    1. The compression we can use in addition to ORC columnar compression is either of one of NONE, ZLIB, SNAPPY.
    2. The default compression codec is ZLIB.
    3. The compression codecs other than above are not allowed.
    4. In general to get an idea about error, read the error log completely to find the issue to some extent. The error log said -

          "org.apache.hadoop.hive.ql.metadata.HiveException:   java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4"