I am trying to use Spark SQL to write parquet
file.
By default Spark SQL supports gzip
, but it also supports other compression formats like snappy
and lzo
.
What is the difference between these compression formats?
Update: The recent versions of Spark uses snappy as default compression format.
Just try them on your data.
lzo and snappy are fast compressors and very fast decompressors, but with less compression, as compared to gzip which compresses better, but is a little slower.
Update many years later: