hadoopcompressionhdfssnappy

How do I read Snappy compressed files on HDFS without using Hadoop?


I'm storing files on HDFS in Snappy compression format. I'd like to be able to examine these files on my local Linux file system to make sure that the Hadoop process that created them has performed correctly.

When I copy them locally and attempt to de-compress them with the Google standard libarary, it tells me that the file is missing the Snappy identifier. When I try to go around this by inserting a Snappy identifier, it messes up the checksum.

What can I do to read these files without having to write a separate Hadoop program or pass it through something like Hive?


Solution

  • I finally found out that I can use the following command to read the contents of a Snappy compressed file on HDFS:

    hadoop fs -text /path/filename
    

    Using the latest commands on Cloudera or HDP:

    hdfs dfs -text /path/filename
    

    If the intent is to download the file in text format for additional examination and processing, the output of that command can be piped to a file on the local system. You can also use head to just view the first few lines of the file.