perlgnumericparseexcel

Spreadsheet::ParseExcel row_range differs from gnumeric


I'm having trouble getting Spreadsheet::ParseExcel to parse an .xls spreadsheet generated by gnumeric, and I think the problem is with gnumeric. ParseExcel's row_range() method is returning (0, 500) when only 285 rows are in use.

I tried selecting cells A1 - I285, and selecting File > Print Area > Set Print Area before saving and processing the .xls file with my Perl script, with no effect.

The file actually contains two worksheets. One computes the number of rows correctly, and the other doesn't. I created both worksheets, but don't know what I might have done differently.


Solution

  • ParseExcel's row_range() method is returning (0, 500) when only 285 rows are in use.

    Spreadsheet::ParseExcel reads the row_range() data from a binary record in the Excel file that stores the max and min row and column values. It doesn't iterate over the worksheet to calculate these values.

    Therefore, if row_range() is reporting (0, 500) as the range of rows in use then these are the values that are stored in that record.

    The source of the difference may be due to rows containing formatting but no data. Excel differentiates between an "Empty" cell and a "Blank" cell. An "Empty" cell is a cell which doesn't contain data whilst a "Blank" cell is a cell which doesn't contain data but does contain formatting. Excel stores "Blank" cells but ignores "Empty" cells.

    I tried selecting cells A1 - I285, and selecting File > Print Area > Set Print Area before saving and processing the .xls file with my Perl script, with no effect.

    The print area doesn't have an effect on the range reported by row_range().

    If you really want to reduce the file down to 285 data containing rows then select the rows from 286-501, delete them and then save the file.

    Alternatively, set up your program to ignore rows containing only cells with blank data.