Currently I am using com.crealytics.spark.excel to read an Excel file, but using this library I can't write the dataset to an Excel file.
This link says that using hadoop office library (org.zuinnote.spark.office.excel
) we can read and write to Excel files
Please help me to write dataset object to an excel file in spark java.
You can use org.zuinnote.spark.office.excel
for both reading and writing excel file using Dataset. Examples are given at https://github.com/ZuInnoTe/spark-hadoopoffice-ds/. However, there is one issue if you read the Excel in Dataset and try to write it in another Excel file. Please see the issue and workaround in scala at https://github.com/ZuInnoTe/hadoopoffice/issues/12.
I have written a sample program in Java using org.zuinnote.spark.office.excel
and workaround given at that link. Please see if this helps you.
public class SparkExcel {
public static void main(String[] args) {
//spark session
SparkSession spark = SparkSession
.builder()
.appName("SparkExcel")
.master("local[*]")
.getOrCreate();
//Read
Dataset<Row> df = spark
.read()
.format("org.zuinnote.spark.office.excel")
.option("read.locale.bcp47", "de")
.load("c:\\temp\\test1.xlsx");
//Print
df.show();
df.printSchema();
//Flatmap function
FlatMapFunction<Row, String[]> flatMapFunc = new FlatMapFunction<Row, String[]>() {
@Override
public Iterator<String[]> call(Row row) throws Exception {
ArrayList<String[]> rowList = new ArrayList<String[]>();
List<Row> spreadSheetRows = row.getList(0);
for (Row srow : spreadSheetRows) {
ArrayList<String> arr = new ArrayList<String>();
arr.add(srow.getString(0));
arr.add(srow.getString(1));
arr.add(srow.getString(2));
arr.add(srow.getString(3));
arr.add(srow.getString(4));
rowList.add(arr.toArray(new String[] {}));
}
return rowList.iterator();
}
};
//Apply flatMap function
Dataset<String[]> df2 = df.flatMap(flatMapFunc, spark.implicits().newStringArrayEncoder());
//Write
df2.write()
.mode(SaveMode.Overwrite)
.format("org.zuinnote.spark.office.excel")
.option("write.locale.bcp47", "de")
.save("c:\\temp\\test2.xlsx");
}
}
I have tested this code with Java 8 and Spark 2.1.0. I am using maven and added dependency for org.zuinnote.spark.office.excel
from https://mvnrepository.com/artifact/com.github.zuinnote/spark-hadoopoffice-ds_2.11/1.0.3