javaexcelapacheapache-poi

Force read only first sheet in Apache POI


I am using Apache POI to read the data only in the first sheet of an excel file. The xlsx files that are submitted usually have only 1 sheet and are around 2.5MB (with a little more than 130k rows of data), and everything goes slow but smooth with no errors. However, if the submitted xlsx has more than one sheet, and if the other sheet(s) also have a lot of data in them, the execution throws an OutOfMemoryError: Java heap space error. Now I am trying to figure out if it somehow possible to always only read the data on the first sheet without worrying about the memory errors (i am running this with -Xmx1024m -Xms512m arguments)

EDIT: here is my code

InputStream inputStream = new FileInputStream(new File(excelfile));
XSSFWorkbook workbook = new XSSFWorkbook(inputStream);

    if (workbook.getNumberOfSheets() != 1) {
      throw new Exception("Make sure excel only has 1 sheet");
    }

The program is throwing an error on the second line (if the excel file has a lot of data on the second sheet as well)


Solution

  • Apache POI usually triggers a lot of issues related to memory, I strongly recommend to use monitorjbs instead https://github.com/monitorjbl/excel-streaming-reader

     InputStream is = new FileInputStream(new File(filePath));
                    Workbook workbook = StreamingReader.builder()
                            .rowCacheSize(100) // number of rows to keep in memory (defaults to 10)
                            .bufferSize(2048) // buffer size to use when reading InputStream to file (defaults to 1024)
                            .open(is)) {
    
                Sheet sheet = workbook.getSheetAt(0);