pandashttpdaskparquetfastparquet

Dask dataframe read parquet format fails from http


I have been dealing with this problem for a week. I use the command

from dask import dataframe as ddf
ddf.read_parquet("http://IP:port/webhdfs/v1/user/...")

I got invalid parquet magic. However ddf.read_parquet is Ok with "webhdfs://"

I would like the ddf.read_parquet works for http because I want to use it in dask-ssh cluster for workers without hdfs access.


Solution

  • Although the comments already partly answer this question, I thought I would add some information as an answer