I am trying to work with the 'openpyxl' library and pandas but receive the following error when I run my code.
ImportError Traceback (most recent call last)
Cell In[27], line 1
----> 1 message_data = (pd.read_excel('message_types.xlsx',
2 sheet_name='messages')
3 .sort_values('id')
4 .drop('id', axis=1))
File c:\PATH.py:495, in read_excel(io, sheet_name, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, date_format, thousands, decimal, comment, skipfooter, storage_options, dtype_backend, engine_kwargs)
493 if not isinstance(io, ExcelFile):
494 should_close = True
--> 495 io = ExcelFile(
496 io,
497 storage_options=storage_options,
498 engine=engine,
499 engine_kwargs=engine_kwargs,
500 )
501 elif engine and engine != io.engine:
502 raise ValueError(
503 "Engine should not be specified when passing "
504 "an ExcelFile - ExcelFile already has the engine set"
505 )
File c:\PATH.py:1567, in ExcelFile.__init__(self, path_or_buffer, engine, storage_options, engine_kwargs)
1564 self.engine = engine
...
--> 164 raise ImportError(msg)
165 else:
166 return None
ImportError: Pandas requires version '3.1.0' or newer of 'openpyxl' (version '3.0.10' currently installed).
I have attempted to update the version of 'openpyxl' and confirmed that 'openpyxl' is installed correctly but the Anaconda navigator lists the latest version available as version '3.0.10' while the 'openpyxl' documentation lists the latest version as '3.1.2'.
Any tips on how to reconcile this?
I ran into a similar problem. I'll share both a minimal working installation of pandas that replicates your problem, and a solution.
To illustrate: install miniconda from homebrew, use conda to install the anaconda stack into an environment called demo, then activate 'demo'
brew install miniconda
conda create -n demo python=3.11 anaconda
enter the env and check versions:
conda activate demo
conda list pandas
yields:
# packages in environment at /opt/homebrew/Caskroom/miniconda/base/envs/demo:
#
# Name Version Build Channel
pandas 2.1.4 py311h7aedaa7_0
and
conda list openpyxl
yields:
# packages in environment at /opt/homebrew/Caskroom/miniconda/base/envs/demo:
#
# Name Version Build Channel
openpyxl 3.0.10 py311h80987f9_0
now launch ipython
and replicate the error (using a working .xlsx
file that has loaded in the past with no errors).
ipython
import pandas as pd
df = pd.read_xlsx("demo.xlsx")
yields the same error you reported (assuming that you don't have the correct openpyxl
somewhere else in your python path, in which case this error will be hard to replicate).
The solution is to upgrade openpyxl
manually using conda. To do this I needed to tell conda to looking the condaforge channel, and then upgrade (note that this is all within the demo env: conda activate demo
):
conda config --add channels conda-forge
conda install openpyxl=3.1.0
now you can confirm the version of openpyxl
(conda list openpyxl
) and confirm that pd.read_excel()
works.
Two notes: first, if you don't change the channels you may get errors when you try to upgrade via the default channels, and second this seems related to an earlier issue raised on the pandas GitHub.
This solution isn't perfect, as it would be ideal to only need to use conda create -n demo python=3.11 anaconda
to get a working anaconda install.