What exactly is the format for Hive LazySimpleSerDe
?
A format like ParquetHiveSerDe tells me that Hive will read the HDFS files in parquet format.
But what is LazySimpleSerDe
? Why not call it something explicit like CommaSepHiveSerDe
or TabSepHiveSerDe
, given LazySimpleSerDe is for delimited files?
LasySimpleSerde
- fast and simple SerDe, it does not recognize quoted values, though it can work with different delimiters, not only commas, default is TAB (\t
). You can specify STORED AS TEXTFILE
in table DDL and LasySimpleSerDe
will be used. For quoted values use OpenCSVSerDe, it is not as fast as LasySimpleSerDe
but works correctly with quoted values.
LasySimpleSerDe is simple for the sake of performance, also it creates Objects in a lazy way, to provide better performance, this is why it is preferable when possible (for text files).
See this example with pipe-delimited (|
) file format: https://stackoverflow.com/a/68095278/2700344
show create table
command for such table prints serde class as org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
, STORED AS TEXTFILE is a shortcut.