I created a .xlsx file in Excel for Widows desktop app. The file uses named ranges, conditional formatting and some basic formulas (SUM, IF, NOT, ISBLANK, *, +). Most of the worksheet and all the workbook are locked with users only able to select and add data to certain cells.
The file was sent to someone who uses a MacBook Air (macOS Mojave v10.14.6) with Excel for Macintosh v12.3.6. He added his data, saved the changes (the file extension is still .xlsx) and sent the file back to me.
The file is destined to be opened by the OpenXML SDK using SpreadsheetDocument.Open in a .Net Core 7.0 Azure Function. The first step that the SDK takes is to unzip the file, but the code was returning a corrupt file message.
using SpreadsheetDocument document = SpreadsheetDocument.Open(stream, false);
OpenXML SDK error:
System.IO.Packaging: File contains corrupted data.
- at System.IO.Packaging.ZipPackage..ctor(Stream s, FileMode packageFileMode, FileAccess packageFileAccess)
- at System.IO.Packaging.Package.Open(Stream stream, FileMode packageMode, FileAccess packageAccess)
- at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.OpenCore(Stream stream, Boolean readWriteMode)
- at DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open(Stream stream, Boolean isEditable, OpenSettings openSettings)
- at ZipFiles.Program.Main(String[] args) in D:\Users\mark.cherry\Source\Repos\ZipFiles\ZipFiles\Program.cs:line 22
It was this that lead me to test opening the file using .Net Core System.IO.Compression.ZipArchive in a bare console application. This returns a similar error but with obvious differences.
using FileStream zipToOpen = new (@"Invoice - Fixed Fee.xlsx.zip", FileMode.Open);
using ZipArchive zip = new (zipToOpen, ZipArchiveMode.Read);
ZipArchive error:
System.IO.Compression: End of Central Directory record could not be found.
- at System.IO.Compression.ZipArchive.ReadEndOfCentralDirectory()
- at System.IO.Compression.ZipArchive..ctor(Stream stream, ZipArchiveMode mode, Boolean leaveOpen, Encoding entryNameEncoding)
- at System.IO.Compression.ZipArchive..ctor(Stream stream, ZipArchiveMode mode)
- at ZipFiles.Program.Main(String[] args) in D:\Users\mark.cherry\Source\Repos\ZipFiles\ZipFiles\Program.cs:line 12
As a further test of the file I changed the extension from .xlsx to .xlsx.zip and I get the same error 'The Compressed (zipped) Folder ... is invalid.':
The strange thing is, the original .xlsx file will open in Excel for Windows desktop app (v2308 (Build 16731.20234 Click-to-Run)):
If I open then save the corrupt file in Excel for Windows desktop app, then change the file extension to .xlsx.zip, the file will open as an archive. I did not change anything in the file, I just opened and saved it.
My understanding is that all .xlsx files are Microsoft Excel Open XML based on the OpenXML format and are archives containing several sub-directories and files.
I asked the same person to create a new blank workbook in Excel for Macintosh and send it to me. When I renamed the file extension of that file to .xlsx.zip, the file opens as an archive.
This is the only user who has returned the file with this problem. All other users are not experiencing this issue, but they may not be using MacBook Airs. I have tried to reproduce the error using iPads, but I could not create a file with the same error. I do not have a Mac so I am unable to diagnose the issue since opening and saving the file in Windows removes the fault.
Why is the locked file returning as something other than an archive when edited in Excel for Macintosh, and why does a blank workbook from the same Excel for Macintosh not show the same problem?
The password for the locked file does contain at least one !. Could this be a cryptography error while saving the file?
Any help to stop this file from becoming corrupt on this users Mac or to allow me to open the file in OpenXML on Windows would be much appreciated.
Test files can be accessed here: Excel for Mac Corruption
On the Excel community forum it was suggested that I look at the differences in the file's binary data between the Mac version and a version that has been opened and saved in Excel for Windows. The idea was to see if I could spot a difference that could cause the error. Unfortunately, there were too many differences for this method to be useful.
As detailed in the body of the question above, I have tried:
Regards,
Mark
The protection in the Excel file is done at two levels:
By default, all the cells in spreadsheet are set to be locked if the spreadsheet is protected. To only allow the users to enter data in some cells I:
Format Cell Protection Option Dialog in Excel
Review > Protect Sheet
Review > Protect Workbook
Protect Workbook Dialog in Excel
The instructions to the Mac user were to open the file then save it without entering any data.
I don’t think I am encrypting anything; I am only protecting the structure of the workbook and spreadsheet. This appears to be true when I look at the files in the .xlsx bundle (.zip archive). All of the .xml files for the protected document are in plain text and the named ranges and default data can be read with notepad.
I've had a chance to scan through the file specification documents and I understand what @tomjebo is driving at now. The Excel file that I protected is not encrypted. Once the Mac user opens it, adds data or doesn't add data, then saves the file again, something in his MS Office environment is adding encryption to the file, probably with a blank password. I assume this could be a default setting in Excel for Mac (I have no way of testing this), or it could be one of the items listed in the Introduction section of the MS-OFFCRYPTO specification file linked in the answer:
"The Office Document Cryptography Structure is relevant to documents that have Information Rights Management (IRM) policies, document encryption, or signing and write protection applied." (Page 8/119: [MS-OFFCRYPTO] - v20230815 Office Document Cryptography Structure Copyright © 2023 Microsoft Corporation Release: August 15, 2023)
As a test, I navigated to OneDrive for Business in a browser then opted to open the file the customer returned to me in Excel online ('Open in browser' option from the context menu). Excel Online asked me to provide a password. The customer did not provide me with a password, and Excel Online will not accept a blank password, so I couldn't open the file. Since the file opens in Excel for Windows on my desktop without asking for a password, I assume that the password is blank. The file is definitely encrypted.
As a side note, I tried to open the file in an Azure Logic App then transfer it back to the Azure Storage Account as a new blob to see if I could use Excel Online to remove the encryption. This does not work because the ‘Get worksheets' action for Excel for Business in the Logic App throws the following HTTP 403 error:
"The request is forbidden by Graph API. Error code is 'PasswordProtectedWorkbook'. Error message is 'Forbidden'."
Since I do not have the time or the knowledge to be able to write code to discover the file's encryption type then unencrypt the file, I am left with two alternatives.
When Excel (and Office in general) encrypts or protects a document, it converts it to a Compound Binary File format. This is no longer a Zip archive. You can see the signature in the first several bytes of the file as 0xD0CF11E0A1B11AE1 (side note, 0xD0CF11E is supposed to read like "doc file" and is the signature for CFB). The Open XML SDK does not open/read encrypted or protected Office documents as do not Zip archive utilities. This is why your document won't open.
You can read about the encryption/protection process in MS-OFFCRYTPO and you can read about the CFB format (the file format used before Office Open XML was adopted) in MS-CFB
Hope this helps.