excelscalaapache-sparkapache-spark-sqlspark-excel

What are the mandatory options for loading Excel file?


I have loaded an excel file from S3 using the below syntax, but I am wondering about the options that need to be set here.

Why is it mandatory to set all the below options for loading excel file? None of these options are mandatory for loading other file types like csv,del,json,avro etc.

val data = sqlContext.read.
format("com.crealytics.spark.excel").
option("location", s3path).
option("useHeader", "true").
option("treatEmptyValuesAsNulls", "true").
option("inferSchema","true").
option("addColorColumns", "true").
load(path)

I get the below error if any of the above options(except location) are not set:

sqlContext.read.format("com.crealytics.spark.excel").option("location", s3path).load(s3path)

Error message :

Name: java.lang.IllegalArgumentException
Message: Parameter "useHeader" is missing in options.
StackTrace:   at com.crealytics.spark.excel.DefaultSource.checkParameter(DefaultSource.scala:37)
          at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:19)
          at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:7)
          at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:345)
          at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
          at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
          at $anonfun$1.apply(<console>:47)
          at $anonfun$1.apply(<console>:47)
          at time(<console>:36)

Solution

  • Most of the options for spark-excel are mandatory except for userSchema and sheetName.

    You can always check for that in the DataSource source code that you can find here.

    You have to remember that this data source or data connector packages are implemented outside of the spark project and each comes with his rules and parameters.