hadoopcommand-line-interfacesequencefile

SequenceFile as text CLI with custom class


I have an HDFS file in SequenceFile format. The key is Text and the value is a custom serializable class (say) MyCustomClass. I want to read this file via the hadoop fs -text command but it fails as hadoop does not know what MyCustomClass definition is.

I also tried hdfs dfs - text command but got the same response back. Using hadoop2.

Is there a way I can specify the class (through a jar for example, like -cp myjar.jar option)?


Solution

  • hadoop fs -libjars my-lib.jar -text output-dir/part-r-*
    

    This will read in the sequence file Key/Value pairs and call toString() on both objects, tab separating them when outputting to stdout. The -libjars specifies where hadoop can find your custom Key / Value classes

    how-to-parse-customwritable-from-text-in-hadoop