phpmemory-managementout-of-memoryphpexcelphpexcelreader

Memory while reading large Excel 2007 (.xlsx)


I'm using PHPExcel which I've used before many times. The problem I have now is when reading Excel2007 files (.xlsx - format). What I'm doing is simply looping the the .xlsx file and creating an array by row/column and then print_r()-ing the results after the read operation to make sure the data output is good before importing it into a MySQL database.

Now when reading the Excel2007 .xlsx file (6MB) the output fails, but whats interesting is if I save the file as the older format .xls (1992-2004 - Excel5) the file becomes larger (16MB) but outputs correctly. This made me think originally it wasn't a memory problem since the older larger .xls file (16MB) ran with no problems and was almost 3x the size of the .xlsx file (6MB).

For test purposes I then copied 25 rows of the 30,000 in the .xlsx (6MB) file and created a new Excel2007 .xlsx and ran the import against the smaller 25 row data-set and it output correctly. This then led me think that it is a memory problem however related specifically to the .xlsx format...

I'm running the server on Amazon Web Services and have C4.Xlarge (16-core, 30GB RAM) so I should have plenty of resources to run this operation.


Question: Why does my output fail when reading a smaller .xlsx file vs a larger .xls file, but then succeed with a smaller .xlsx (25-row) file?


//PHP Function

function parse_xls($file){
    ini_set('memory_limit','-1');
    $type = PHPExcel_IOFactory::identify($file);
    $reader = PHPExcel_IOFactory::createReader($type);
    $reader->setReadDataOnly(true);
    $xls = $reader->load($file); 
    $sheet = $xls->getActiveSheet();
    $highestRow = $sheet->getHighestRow();
    $highestColumn = $sheet->getHighestColumn();
    $highestColumnIndex = PHPExcel_Cell::columnIndexFromString($highestColumn);
    for($row=2; $row <= ($highestRow+2); $row++){
        $import[$row] = [];
        for($col=0; $col < $highestColumnIndex; $col++){
            $result = $sheet->getCellByColumnAndRow($col, $row)->getValue();
            array_push($import[$row],$result);
        }
    }
    print_r($import);
    die();
}

Solution

  • for big files i use chunkReadFilter

    $iChunkSize=1000; for($iStartRow = $row_start; $iStartRow <= $totalRows; $iStartRow += $iChunkSize) { $objReader = $oExcel->SetCreateReader(); $oChunkFilter = new chunkReadFilter(); $objReader->setReadFilter($oChunkFilter); $oChunkFilter->setRows($iStartRow,$iChunkSize); $objReader->setReadFilter($oChunkFilter); $objReader->setReadDataOnly(true); $objPHPExcel = $objReader->load($files['path']); $objPHPExcel->setActiveSheetIndex($iList); $sFromCell = 'A'.$iStartRow; $aData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,false,$sFromCell); // free memory unset($objPHPExcel); unset($objReader); unset($oChunkFilter); // parse data foreach ($aData as $sKey => $aValue) { ... } // real data rows if (count($aData) < $iChunkSize) { unset($aData); break; } unset($aData); }