hadoophdfsparquet

Parquet without Hadoop?


I want to use parquet in one of my projects as columnar storage. But i dont want to depends on hadoop/hdfs libs. Is it possible to use parquet outside of hdfs? Or What is the min dependency?


Solution

  • Investigating the same question I found that apparently it's not possible for the moment. I found this git issue, which proposes decoupling parquet from the hadoop api. Apparently it has not been done yet.

    In the Apache Jira I found an issue, which asks for a way to read a parquet file outside hadoop. It is unresolved by the time of writing.

    EDIT:

    Issues are not tracked on github anymore (first link above is dead). A newer issue I found is located on apache's Jira with the following headline:

    make it easy to read and write parquet files in java without depending on hadoop