apache-sparkpysparkparquet

pyspark.sql.utils.AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'


This has a different answer to those given in the post above

I am getting an error that reads

pyspark.sql.utils.AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'

when I try to read in a parquet file like such using Spark 2.1.0

data = spark.read.parquet('/myhdfs/location/')

I have checked and the file/table is not empty by looking at the impala table through the Hue WebPortal. Also, other files that I have stored in similar directories read absolutely fine. For the record, the file names contain hyphens but no underscores or full-stops/periods.

Hence, none of the answers in the following post apply Unable to infer schema when loading Parquet file

Any ideas?


Solution

  • It turns out I was getting this error because there was another level to the directory structure. The following was what I needed;

    data = spark.read.parquet('/myhdfs/location/anotherlevel/')