I just got my new MacBook Pro with M1 Max chip and am setting up Python. I've tried several combinational settings to test speed - now I'm quite confused. First put my questions here:
Evidence supporting my questions is as follows:
Here are the settings I've tried:
1. Python installed by
Kind
of python process is Apple
).Kind
of python process is Intel
).2. Numpy installed by
conda install numpy
: numpy from original conda-forge channel, or pre-installed with anaconda.conda install -c apple tensorflow-deps
python -m pip install tensorflow-macos
python -m pip install tensorflow-metal
3. Run from
Here is the test code:
import time
import numpy as np
np.random.seed(42)
a = np.random.uniform(size=(300, 300))
runtimes = 10
timecosts = []
for _ in range(runtimes):
s_time = time.time()
for i in range(100):
a += 1
np.linalg.svd(a)
timecosts.append(time.time() - s_time)
print(f'mean of {runtimes} runs: {np.mean(timecosts):.5f}s')
and here are the results:
+-----------------------------------+-----------------------+--------------------+
| Python installed by (run on)→ | Miniforge (native M1) | Anaconda (Rosseta) |
+----------------------+------------+------------+----------+----------+---------+
| Numpy installed by ↓ | Run from → | Terminal | PyCharm | Terminal | PyCharm |
+----------------------+------------+------------+----------+----------+---------+
| Apple Tensorflow | 4.19151 | 4.86248 | / | / |
+-----------------------------------+------------+----------+----------+---------+
| conda install numpy | 4.29386 | 4.98370 | 4.10029 | 4.99271 |
+-----------------------------------+------------+----------+----------+---------+
This is quite slow. For comparison,
2.39917s
.2.53214s
, and miniforge+apple_tensorflow_numpy is 1.00613s
.Here is the CPU information details:
$ sysctl -a | grep -e brand_string -e cpu.core_count
machdep.cpu.brand_string: Intel(R) Core(TM) i5-6360U CPU @ 2.00GHz
machdep.cpu.core_count: 2
% sysctl -a | grep -e brand_string -e cpu.core_count
machdep.cpu.brand_string: Apple M1 Max
machdep.cpu.core_count: 10
I follow instructions strictly from tutorials - but why would all these happen? Is it because of my installation flaws, or because of M1 Max chip? Since my work relies heavily on local runs, local speed is very important to me. Any suggestions to possible solution, or any data points on your own device would be greatly appreciated :)
Update Mar 28 2022: Please see @AndrejHribernik's comment below.
How to install numpy on M1 Max, with the most accelerated performance (Apple's vecLib)? Here's the answer as of Dec 6 2021.
So that your Python is run natively on arm64, not translated via Rosseta.
$ bash Miniforge3-MacOSX-arm64.sh
np_veclib
)$ conda create -n np_veclib python=3.9
$ conda activate np_veclib
numpy
, first need to install cython
and pybind11
:$ conda install cython pybind11
numpy
by (Thanks @Marijn's answer) - don't use conda install
!$ pip install --no-binary :all: --no-use-pep517 numpy
$ git clone https://github.com/numpy/numpy
$ cd numpy
$ cp site.cfg.example site.cfg
$ nano site.cfg
Edit the copied site.cfg
: add the following lines:
[accelerate]
libraries = Accelerate, vecLib
Then build and install:
$ NPY_LAPACK_ORDER=accelerate python setup.py build
$ python setup.py install
>>> import numpy
>>> numpy.show_config()
Then, info like /System/Library/Frameworks/vecLib.framework/Headers
should be printed.
Make conda recognize packages installed by pip
conda config --set pip_interop_enabled true
This must be done, otherwise if e.g. conda install pandas
, then numpy
will be in The following packages will be installed
list and installed again. But the new installed one is from conda-forge
channel and is slow.
Except for the above optimal one, I also tried several other installations
np_default
: conda create -n np_default python=3.9 numpy
np_openblas
: conda create -n np_openblas python=3.9 numpy blas=*=*openblas*
np_netlib
: conda create -n np_netlib python=3.9 numpy blas=*=*netlib*
The above ABC options are directly installed from conda-forge channel. numpy.show_config()
will show identical results. To see the difference, examine by conda list
- e.g. openblas
packages are installed in B. Note that mkl
or blis
is not supported on arm64.
np_openblas_source
: First install openblas by brew install openblas
. Then add [openblas]
path /opt/homebrew/opt/openblas
to site.cfg
and build Numpy from source.M1
and i9–9880H
in this post.i5-6360U
2cores on MacBook Pro 2016 13in.Here I use two benchmarks:
mysvd.py
: My SVD decompositionimport time
import numpy as np
np.random.seed(42)
a = np.random.uniform(size=(300, 300))
runtimes = 10
timecosts = []
for _ in range(runtimes):
s_time = time.time()
for i in range(100):
a += 1
np.linalg.svd(a)
timecosts.append(time.time() - s_time)
print(f'mean of {runtimes} runs: {np.mean(timecosts):.5f}s')
dario.py
: A benchmark script by Dario Radečić at the post above.+-------+-----------+------------+-------------+-----------+--------------------+----+----------+----------+
| sec | np_veclib | np_default | np_openblas | np_netlib | np_openblas_source | M1 | i9–9880H | i5-6360U |
+-------+-----------+------------+-------------+-----------+--------------------+----+----------+----------+
| mysvd | 1.02300 | 4.29386 | 4.13854 | 4.75812 | 12.57879 | / | / | 2.39917 |
+-------+-----------+------------+-------------+-----------+--------------------+----+----------+----------+
| dario | 21 | 41 | 39 | 323 | 40 | 33 | 23 | 78 |
+-------+-----------+------------+-------------+-----------+--------------------+----+----------+----------+