javaapache-sparkdatasetrddspark-excel

How to mention individual sheet names while writing mutiple org.apache.spark.sql.Dataset into an .xls file using crealytics / spark-excel in java?


I am trying to write different Java Datasets into an excel file which will contain multiple sheets inside it using crealytics/spark-excel library.

<dependency>
            <groupId>com.crealytics</groupId>
            <artifactId>spark-excel_2.11</artifactId>
            <version>0.13.0</version>
</dependency>

How do I provide a name for those individual excel sheets?

Here's what I am trying to do:

import org.apache.spark.api.java.JavaRDD;

SparkSession spark = SparkSession.builder().appName("LineQuery").getOrCreate();

Dataset<Row> df1 = spark.sql("SELECT * FROM my_table1");
Dataset<Row> df2 = spark.sql("SELECT * FROM my_table2");

df1.write().format("com.crealytics.spark.excel").option("sheetName","My Sheet 1").option("header", "true").save("hdfs://127.0.0.1:9000/var/www/" + outFile + ".xls");

df2.write().format("com.crealytics.spark.excel").option("sheetName","My Sheet 2").option("header", "true").mode(SaveMode.Append).save("hdfs://127.0.0.1:9000/var/www/" + outFile + ".xls");

Solution

  • Use dataAddress option instead

    Example:

    >>> df = spark.createDataFrame([(11, 12), (21, 22)])
    >>> df.show()
    +---+---+
    | _1| _2|
    +---+---+
    | 11| 12|
    | 21| 22|
    +---+---+
    >>> df.where("_1 == 11").write.format("com.crealytics.spark.excel").option("dataAddress", "my sheet 1[#All]").option("header", "true").mode("append").save("/tmp/excel-df.xlsx")
    >>> df.where("_1 == 21").write.format("com.crealytics.spark.excel").option("dataAddress", "my sheet 2[#All]").option("header", "true").mode("append").save("/tmp/excel-df.xlsx")