I'm working with h2o (latest version 3.26.0.10) on a Hadoop cluster. I've read in a parquet file from HDFS and have performed some manipulation on it, built a model, etc.
I've stored some important results in an H2OFrame
that I wish to export to local storage, instead of HDFS. Is there a way to export this file as a parquet?
I tried using h2o.exportFile
, documentation here: http://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.exportFile.html but all the examples are for writing .csv. I tried using the a file path with .parquet
as an extension and that didn't work. It wrote a file but I think it was basically a .csv as it was identical file size to the .csv.
example: h2o.exportFile(iris_hf, path = "/path/on/h2o/server/filesystem/iris.parquet")
On a related note, if I were to export my H2OFrame
to HDFS instead of local storage, would it be possible to write that in parquet format? I could at least then move that to local storage.
h2o
added support for exporting parquet files as of version 3.38.0.1.
You need to set the format
argument to be "parquet"
. Note that h2o.exportFile
will ignore the parts
argument if you specify "parquet"
. Instead, it chooses the number of parts based on the number of chunks of your data.
https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.exportFile.html
h2o.exportFile(
data = <your h2oFrame>,
path = "/path/to/exported/parquet/dir",
format = "parquet"
)