Maybe "writing" wasn't the correct word since in this function, I am just setting the cells and then writing afterwards.
I have a function that I have pin pointed to be the cause of it bogging down. When it gets to this function, it spends over 10 minutes here before I just terminate it.
This is the function that I am passing an output_wb
to:
private static void buildRowsByListOfRows(int sheetNumber, ArrayList<Row> sheet, Workbook wb) {
Sheet worksheet = wb.getSheetAt(sheetNumber);
int lastRow;
Row row;
String cell_value;
Cell cell;
int x = 0;
System.out.println("Size of array list: " + sheet.size());
for (Row my_row : sheet) {
try {
lastRow = worksheet.getLastRowNum();
row = worksheet.createRow(++lastRow);
for (int i = 0; i < my_row.getLastCellNum(); i++) {
cell_value = getCellContentAsString(my_row.getCell(i, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK));
cell = row.createCell(i);
cell.setCellValue(cell_value);
System.out.println("setting row #: " + x + "with value =>" + cell_value);
}
x++;
} catch (Exception e) {
System.out.println("SOMETHING WENT WRONG");
System.out.println(e);
}
}
}
The size of the ArrayList
is 73,835
. It starts off running pretty fast then it gets to around row 20,000
and it then you can see the print statements in the loop getting spread out further and further apart. Each row has 70
columns.
Is this function really written that poorly or is something else going on? What can I do to optimize this?
I create the output workbook like this if this matters:
// Create output file with the required sheets
createOutputXLSFile(output_filename_path);
XSSFWorkbook output_wb = new XSSFWorkbook(new FileInputStream(output_filename_path));
And the createOutputXLSFile()
looks like this:
private static void createOutputXLSFile(String output_filename_path) throws FileNotFoundException {
try {
// Directory path where the xls file will be created
// Create object of FileOutputStream
FileOutputStream fout = new FileOutputStream(output_filename_path);
XSSFWorkbook wb = new XSSFWorkbook();
wb.createSheet("Removed records");
wb.createSheet("Added records");
wb.createSheet("Updated records");
// Build the Excel File
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
wb.write(outputStream);
outputStream.writeTo(fout);
outputStream.close();
fout.close();
wb.close();
} catch (IOException e) {
e.printStackTrace();
}
}
private static String getCellContentAsString(Cell cell) {
DataFormatter fmt = new DataFormatter();
String data = null;
if (cell.getCellType() == CellType.STRING) {
data = String.valueOf(cell.getStringCellValue());
} else if (cell.getCellType() == CellType.NUMERIC) {
data = String.valueOf(fmt.formatCellValue(cell));
} else if (cell.getCellType() == CellType.BOOLEAN) {
data = String.valueOf(fmt.formatCellValue(cell));
} else if (cell.getCellType() == CellType.ERROR) {
data = String.valueOf(cell.getErrorCellValue());
} else if (cell.getCellType() == CellType.BLANK) {
data = String.valueOf(cell.getStringCellValue());
} else if (cell.getCellType() == CellType._NONE) {
data = String.valueOf(cell.getStringCellValue());
}
return data;
}
Update #1- Seems to be happening here. If I comment out all 3 lines then it finishes:
cell_value = getCellContentAsString(my_row.getCell(i, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK));
cell = row.createCell(i);
cell.setCellValue(cell_value);
Update #2 - If I comment out these two lines, then the loop finishes as expected:
cell = row.createCell(i); // The problem
cell.setCellValue(cell_value);
So now I know the problem is the row.createCell(i)
but why? How can I optimize this?
I finally managed to resolve this issue. Turns out that using XSSF
to write is just too slow if the files are large. So I converted the XSSF
output workbook to an SXSSFWorkbook
. To do that I just passed in my already existing XSSFWorkbook
into SXSSFWorkbook
like this :
// Create output file with the required sheets
createOutputXLSFile(output_filename_path);
XSSFWorkbook output_wb_temp = new XSSFWorkbook(new FileInputStream(output_filename_path));
SXSSFWorkbook output_wb = new SXSSFWorkbook(output_wb_temp);
The rest of the code works as is.