I am using the PowerShell module ImportExcel
from here: https://github.com/dfinke/ImportExcel
To eliminate my complex code I am just running the following: (xxx indicates folders I dont want to show)
$ExcelFilePath = "C:\Users\<xxx>\code\<xxx>\<xxx>\AllSqlServersV2.xlsx"
$Excel = Open-ExcelPackage -Path $ExcelFilePath
$Excel.Recommended_for_instances.Dimension
And here is the output for $Excel.Recommended_for_instances.Dimension
:
Start : OfficeOpenXml.ExcelCellAddress
End : OfficeOpenXml.ExcelCellAddress
Table :
Address : A1:AF2 <=== 👀🛑
IsName : False
Rows : 2
Columns : 32 <=== 👀🛑
And here is a screen shot of the data. You will see a vertical red line after column AF
to show that I can get any data after this point. (sorry for the very wide picture)
A snapshot of the data point that I am having issues getting past:
As you can see, I am not doing anything out of the ordinary.
The interesting thing is, if I open the Excel file, press ctrl s
only (save the file without making any changes) then the output of the above code becomes correct:
Start : OfficeOpenXml.ExcelCellAddress
End : OfficeOpenXml.ExcelCellAddress
Table :
Address : A1:BA2 <=== 👀✅
IsName : False
Rows : 2
Columns : 53 <=== 👀✅
I have looked for a reason for this all day but can't find anything useful.
Any help would be great.
cheers
Russ
I've compared your original spreadsheet file in the GitHub issue versus the one re-saved by Excel and it looks like there's a couple of parts that Excel saves differently to whatever tool you used to generate the original file.
These differences alter how EPPlus (the underlying library used by ImportExcel
) processes the file, which at least explains the different results you're seeing in the Dimension
property in ImportExcel.
Here's some screenshots from BeyondCompare comparing the unzipped original file and the one saved in Excel:
Some things to note:
Your original "broken" file is on the left, the one saved by Excel is on the right
The Excel file has a "dimension" element, although as far as I can tell, EPPlus doesnt actually read this, and calculates the Dimension
dynamically from the column and cell data in the rest of the worksheet xml.
The original file has multiple <cols><col>...</col></cols>
with just one <col>
per <cols>
, whereas the Excel version has all of the <col>
elements in a single <cols>
- it turns out EPPlus only reads the first <cols>
element (see ExcelWorksheet.cs#L1439) which it uses to calculate used range / Dimension. (I'm not sure if the OpenXml spec allows multiple <cols>
elements, so this could be a bug in whatever you used to generate the file, or a compatibility issue in EPPlus.
There's some additional differences further down the file:
For columns A1
-AF1
, the original file serialises them as, e.g. <c r="AF1" s="1" t="n">
, but from AG1
onwards it serialises as r="nnn"
- e.g. <c r="322" s="1" t="n">
- EPPlus fails to parse these here and so doesn't expand the Dimension
property to include the columns. By contrast, Excel saves them as <c r="AG2" s="1">
which EPPlus reads find and expands the range to include these columns..
I'm not sure if this is a bug in your original file, or again a compatiblity issue in EPPlus.
There's a couple of things Excel saves differently compared to your original file that mean EPPlus can process the saved file correctly, and this surfaces up in the Dimension
property ImportExcel
module.
Maybe these differences are allowed in older / newer versions of the OpenXML spec, or maybe Excel has some additional error handling for reading and correcting some common errors in files, but as it stands, EPPlus (and vicariously, ImportExcel) doesn't like the files generated by your application :-(