apache-sparkpysparkapache-spark-sqlspark-excel

How to write dataset object to excel in spark java?


I Am reading excel file using com.crealytics.spark.excel package. Below is the code to read an excel file in spark java.

    Dataset<Row> SourcePropertSet = sqlContext.read()
               .format("com.crealytics.spark.excel")
               .option("location", "D:\\5Kto10K.xlsx")
               .option("useHeader", "true")
               .option("treatEmptyValuesAsNulls", "true")
               .option("inferSchema", "true")
               .option("addColorColumns", "false")
               .load("com.databricks.spark.csv");

But I tried with the same (com.crealytics.spark.excel) package to write dataset object to an excel file in spark java.

    SourcePropertSet.write()
          .format("com.crealytics.spark.excel")
          .option("useHeader", "true")
          .option("treatEmptyValuesAsNulls", "true")
          .option("inferSchema", "true")
          .option("addColorColumns", "false").save("D:\\resultset.xlsx");

But i am getting below error.

java.lang.RuntimeException: com.crealytics.spark.excel.DefaultSource does not allow create table as select.

And even I tried with org.zuinnote.spark.office.excel package also. below is the code for that.

    SourcePropertSet.write()
             .format("org.zuinnote.spark.office.excel")
             .option("write.locale.bcp47", "de") 
             .save("D:\\result");

i have added following dependencies in my pom.xml

<dependency>
              <groupId>com.github.zuinnote</groupId>
              <artifactId>hadoopoffice-fileformat</artifactId>
              <version>1.0.0</version>
          </dependency>
        <dependency>
            <groupId>com.github.zuinnote</groupId>
            <artifactId>spark-hadoopoffice-ds_2.11</artifactId>
            <version>1.0.3</version>
        </dependency> 

But I am getting below error.

java.lang.IllegalAccessError: tried to access method org.zuinnote.hadoop.office.format.mapreduce.ExcelFileOutputFormat.getSuffix(Ljava/lang/String;)Ljava/lang/String; from class org.zuinnote.spark.office.excel.ExcelOutputWriterFactory

Please help me to write dataset object to an excel file in spark java.


Solution

  • Looks like the library you chose, com.crealytics.spark.excel, does not have any code related to writing excel files. Underneath it uses Apache POI for reading Excel files, there are also few examples.

    The good news are that CSV is a valid Excel file, and you may use spark-csv to write it. You need to change your code like this:

    sourcePropertySet.write
        .format("com.databricks.spark.csv")
        .option("header", "true")
        .save("D:\\resultset.csv");
    

    Keep in mind that Spark makes 1 output file per partition, and you might want to do .repartition(1) to have exactly one result file.