I followed instructions from apple website (https://developer.apple.com/metal/pytorch/) and when I verified mps support with its Python script, it just gave me back something I do not understand. (It's too long, partial listed below) I wish I could use the GPU acceleration for stable diffusion. My Macbook has Radeon Pro 555 with Ventura OS. Help please :(
Python 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> if torch.backends.mps.is_available():
... mps_device = torch.device("mps")
... x = torch.ones(1, device=mps_device)
... print (x)
... else:
... print ("MPS device not found.")
...
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor.py", line 461, in __repr__
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor_str.py", line 677, in _str
return _str_intern(self, tensor_contents=tensor_contents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor_str.py", line 597, in _str_intern
tensor_str = _tensor_str(self, indent)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor_str.py", line 349, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor_str.py", line 137, in __init__
nonzero_finite_vals = torch.masked_select(
^^^^^^^^^^^^^^^^^^^^
RuntimeError: Failed to create indexing library, error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:168:1: error: type 'const constant ulong3 *' is not valid for attribute 'buffer'
REGISTER_INDEX_OP_ALL_DTYPES(select);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:160:5: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
REGISTER_INDEX_OP(8bit, idx64, char, INDEX_OP_TYPE, ulong3); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:138:5: note: expanded from macro 'REGISTER_INDEX_OP'
constant IDX_DTYPE * offsets [[buffer(3)]], \
^ ~~~~~~~~~
program_source:168:1: note: type 'ulong3' (vector of 3 'unsigned long' values) cannot be used in buffer pointee type
program_source:160:59: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
REGISTER_INDEX_OP(8bit, idx64, char, INDEX_OP_TYPE, ulong3); \
^
program_source:168:1: error: explicit instantiation of 'index_select' does not refer to a function template, variable template, member function, member class, or static data member
REGISTER_INDEX_OP_ALL_DTYPES(select);
^
program_source:160:5: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
REGISTER_INDEX_OP(8bit, idx64, char, INDEX_OP_TYPE, ulong3); \
^
program_source:134:13: note: expanded from macro 'REGISTER_INDEX_OP'
kernel void index_ ## INDEX_OP_TYPE<DTYPE, IDX_DTYPE>( \
^
<scratch space>:9:1: note: expanded from here
index_select
^
program_source:20:13: note: candidate template ignored: substitution failure [with T = char, OffsetsT = unsigned long __attribute__((ext_vector_type(3)))]: type 'unsigned long const constant * __attribute__((ext_vector_type(3)))' is not valid for attribute 'buffer'
kernel void index_select(
^
program_source:168:1: error: type 'const constant ulong3 *' is not valid for attribute 'buffer'
REGISTER_INDEX_OP_ALL_DTYPES(select);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:162:5: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
REGISTER_INDEX_OP(16bit, idx64, short, INDEX_OP_TYPE, ulong3); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:138:5: note: expanded from macro 'REGISTER_INDEX_OP'
constant IDX_DTYPE * offsets [[buffer(3)]], \
^ ~~~~~~~~~
program_source:168:1: note: type 'ulong3' (vector of 3 'unsigned long' values) cannot be used in buffer pointee type
program_source:162:59: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
REGISTER_INDEX_OP(16bit, idx64, short, INDEX_OP_TYPE, ulong3); \
^
program_source:168:1: error: explicit instantiation of 'index_select' does not refer to a function template, variable template, member function, member class, or static data member
REGISTER_INDEX_OP_ALL_DTYPES(select);
^
program_source:162:5: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
REGISTER_INDEX_OP(16bit, idx64, short, INDEX_OP_TYPE, ulong3); \
^
program_source:134:13: note: expanded from macro 'REGISTER_INDEX_OP'
kernel void index_ ## INDEX_OP_TYPE<DTYPE, IDX_DTYPE>( \
^
<scratch space>:17:1: note: expanded from here
index_select
^
....
...
program_source:248:13: note: candidate template ignored: substitution failure [with T = metal::_atomic<int, void>, E = int, OffsetsT = unsigned long __attribute__((ext_vector_type(3)))]: type 'unsigned long const constant * __attribute__((ext_vector_type(3)))' is not valid for attribute 'buffer'
kernel void index_put_accumulate_native_dtypes(
^
}
>>>
>>>
I degrade Python form 3.12.1 to 3.11.1, and reinstall the latest version of Pytorch nightly, still no luck with the result.
I can replicate this on recent Nightly builds (notably,2.3.0.dev20240114
). However, the latest stable release (Torch 2.1.2) works well.
Try to create a new environment with the stable release of Torch. The Apple documentation for MPS acceleration with PyTorch recommends the Nightly build because it used to be more experimental.
conda create -n torchstable python=3.8
conda activate torchstable
pip3 install torch torchvision torchaudio
Next, try to run your code
Update: This is confirmed as an issue on recent PyTorch nightly builds. See here and here.