I've mostly only used xlwings to open (read-write
) workbooks (since the workbooks I read have complicated macros). But I've recently begun using openpyxl to open (read-only
) workbooks when I've needed to read thousands of workbooks to scrape some data.
I've noticed that there is a considerable difference between how xlwings and openpyxl read workbooks. I believe xlwings relies on pywin32
to read workbooks. When you read a workbook with xlwings.Book(<filename>)
the actual workbook opens up. I have a feeling this is a result of pywin32
.
However, when using openpyxl.load_workbook(<filename>)
a workbook window does not appear. I have a feeling this is a result of not using pywin32
.
Beyond this, I've no further understanding how the backends work for each libraries. Could someone shine some light on this? Is there a benefit/cost to relying on xlwings
and pywin32
for reading workbooks, as opposed to openpyxl
which does not seem to use pywin32
?
You are correct in that xlwings
relies on pywin32
, whereas openpyxl
does not.
A ".xlsx" excel file is essentially a zip-file containing multiple XML files formatted according to Microsoft's OOXML specification. With this specification it's possible to create a program capable of directly reading/writing excel files in just about any programming language. This is the approach applied in openpyxl
: it uses python code to read/write excel files directly.
A Microsoft Excel application can be started and controlled by an external program through the Win32 COM API. The pywin32
package provides an interface between Win32 COM and Python. Through a python script with the right pywin32 commands you can fully control an Excel Application (open excel files, query data from cells, write data to cells, save excel files, etc.). The pywin32
commands that you can use mirror the Excel VBA commands, albeit with python syntax.
xlwings
is (among other things) a user-friendly wrapper around pywin32
. It introduces several concise-yet-powerful methods. An example would be the methods for direct conversion of an excel cell range to a numpy array or pandas dataframe (and vice versa).
A fundamental difference between xlwings
and openpyxl
is that the former requires that MS Excel is installed on your machine, whereas the latter does not.