javaexceldateapache-poipoi-hssf

Apache POI - Parsing Dates with HSSFListener


First off, I'm working on using Apache POI's Event API using HSSFListener and XSSFSheetXMLHandler.SheetContentsHandler to read large files. So I'm not using a workbook where I can use some of the handy methods. I've already implemented it with that approach and got OOM errors.

For XLS files, I'm having trouble figuring out how to determine from a NumberRecord if the format is a Date format or Number format. I've looked at this old link (Using Apache POI HSSFListener how to identify date type) but only the Tika link works as the other no longer exists. It seems some modifications would work with the FormatTrackingHSSFListener but I'm missing something there.

I saw some people mentioning using the DateUtil to check if the value is in an InternalDateFormat like below, but that didn't work.

case NumberRecord.sid:
    NumberRecord numrec = (NumberRecord) record;
    if (DateUtil.isInternalDateFormat(numrec.getXFIndex())) {
        System.out.println("Cell found with date value " + DateUtil.getJavaDate(numrec.getValue())
                + " at row " + numrec.getRow() + " and column " + numrec.getColumn());
    } else {
        System.out.println("Cell found with value " + numrec.getValue()
                + " at row " + numrec.getRow() + " and column " + numrec.getColumn());
    }

I wonder if the isADateFormat method would work (https://poi.apache.org/apidocs/dev/org/apache/poi/ss/usermodel/DateUtil.html#isADateFormat-int-java.lang.String-) but I can't figure out what the formatString would be.

Thoughts? Much appreciated. The documentation is somewhat there but it's "fun" to get thru.


Solution

  • In case anyone else has issues with this, I've worked through this with the help from the TIKA extractor link in the other question. I would suggest implementing an HSSFListener which itself has a FormatTrackingHSSFListener as a member variable. This will your initial listener and will capture all the formats (if I understand correctly), and it itself will have a child listener of your implementation.

    public class YourListener implements HSSFListener {
    
        private SSTRecord sstRecord;
        private FormatTrackingHSSFListener formatListener;
        private List<String> sheetNames = new ArrayList<>();
        private Integer currentSheetIndex = -1;
        private Integer rowCount = 0;
    
        public YourListener() {
            this.formatListener = new FormatTrackingHSSFListener(this);
        }
    }
    

    When you override processRecord, your switch for the NumberRecord.sid can call the formatListener's formatNumberDateCell. So something like:

    case NumberRecord.sid: // Contains a numeric cell value
         NumberRecord number = (NumberRecord) record;
         System.out.println(record, formatListener.formatNumberDateCell(number));
         break;