I want to analyze near-infrared (NIR) spectra in Python. My spectra are stored in the spc file format. So I need a tool that lets me import such files. "Pyspectra" seems to be a good module for this. However, I am unable to install it in a fresh virtual environment with Python 3.12.5 and pip 24.2 on a Windows 10 machine.
pip install pyspectra
fails with an error message:
"Getting requirements to build wheel did not run successfully".
The last line of the traceback states:
ModuleNotFoundError: No module named 'numpy'.
I installed numpy with pip install numpy
and verified that it works with import numpy as np
. No problem here. I also made sure that I am in the same virtual environment in which I wish to install pyspectra.
But I still cannot import pyspectra. Pip continues to claim that it cannot find numpy.
Could this be a dependency issue between 'pyspectra 0.0.1.2' and 'numpy 2.1.2'?
For reference: This is my code on the Windows command line
# Create and activate a virtual environment
C:\user\...\Desktop>python -m venv venv
C:\user\...\Desktop>venv\scripts\activate
# Import numpy (to make sure it is installed)
(venv) C:\user\...\Desktop>py -m pip install numpy
# Import pyspectra
(venv) C:\user\...\Desktop>py -m pip install pyspectra
Collecting pyspectra
Using cached pyspectra-0.0.1.2-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: numpy in c:\users\...\venv\lib\site-packages (from pyspectra) (2.1.2)
Collecting pandas (from pyspectra)
Using cached pandas-2.2.3-cp312-cp312-win_amd64.whl.metadata (19 kB)
Collecting spc-spectra (from pyspectra)
Using cached spc_spectra-0.4.0.tar.gz (8.6 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [24 lines of output]
Traceback (most recent call last):
[...]
File "C:\Users\...\AppData\Local\Temp\pip-install-v_qa0rxl\spc-spectra_a351371d52ef45edbd585ef2843e5c1e\spc_spectra\spc.py", line 10, in <module>
import numpy as np
ModuleNotFoundError: No module named 'numpy'
[end of output]
Edit: Rewrote question to make it more understandable.
I figured it out myself. I answer my own question in case anybody else tries to import near-infrared spectra from SPC files in Python. This might also serve as an instructive example of what can happen if someone moves from doing data science in R to Python without understanding the difference between R's CRAN and Python's PyPI software repositories.
TL;DR: "PySpectra" is outdated. Use "pyfasma-spc" instead.
The Problem
It really is an incompatibility issue although not related to numpy.
PySpectra requires the installation of 'spc-spectra'. The setup.py file of spc-spectra is not compatible with Pip version 24.2. I managed to install a local copy of this package by replacing the content of setup.py with the following code:
import setuptools
if __name__ == "__main__":
setuptools.setup()
I was then able to install the package via pip install -e /.../spc_spectra-0.4.0
. Afterwards I could also perform pip install pyspectra
without an error.
The Underlying Issue
It's important to understand that R's CRAN repository and Python's PyPI work differently:
If you have a new task (e.g. import SPC files) and simply pip install the next best package that you see on PyPI, you risk installing obsolete code or (in rare cases) even outright malicious code.
PySpectra was released in 2020 and hasn't received any update since. The required spc-spectra package was released in 2018 and has neither received any updates. It's safe to assume that both packages are no longer actively maintained.
The good news is: There is a more recent package for dealing with SPC files in Python: pyfasma-spc
This package is well documented and was last updated in 2024. Installation with Pip version 24.2 works. Importing data from SPC files worked fine on my computer.