Hi the following is what I did:
cmake -G Ninja -S llvm -B build -DCMAKE_INSTALL_PREFIX=../bin -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_PROJECTS="mlir;llvm" -DLLVM_TARGETS_TO_BUILD="host;NVPTX;AMDGPU" -DLLVM_PARALLEL_COMPILE_JOBS=32 -DLLVM_PARALLEL_LINK_JOBS=4
ninja
python -m venv .venv --prompt triton
source .venv/bin/activate
pip install ninja cmake wheel
LLVM_INCLUDE_DIRS=../llvm-project/build/include LLVM_LIBRARY_DIR=../llvm-project/build/lib LLVM_SYSPATH=../llvm-project/build CMAKE_BUILD_TYPE=Debug pip install -e python
And I cannot build it since it cost extremely large memory, even my computer has 48G is not enough.
Thus I would like to ask:
I hit this issue as well. Wow does memory usage go out of control with 24 threads (Ryzen 5900X). Looking in the setup.py file for Triton, I saw that it uses MAX_JOBS on Linux, so, in the Triton source directory:
MAX_JOBS="6" pip3 install .
That keeps the maximum number of threads to 6. It seems each thread will use 1-2.5GB of RAM, so for 32 threads which I see in your question, you'd need at least 80Gb of RAM.
I was following instructions here when I hit this issue: https://docs.vllm.ai/en/latest/getting_started/amd-installation.html
Modify your step #6 to include MAX_JOBS="[maximum number of parallel jobs]", which I think with 48GB of RAM, not to go past 16. Maybe try 8 to be conservative.