pythonhuggingface-transformerslarge-language-modelfine-tuningllamacpp

Unsloth doesn't find Llama.cpp to convert fine-tuned LLM to GGUF


I am executing on an Azure VM this notebook from the Unsloth docs:

https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb

Where in the end they save the model to GGUF format after fine-tuning like this:

model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m") # or any other quantization

I get logs

...long list of layer quantization logs like
INFO:hf-to-gguf:blk.0.ffn_down.weight,       torch.bfloat16 --> BF16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,       torch.bfloat16 --> BF16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.0.ffn_up.weight,         torch.bfloat16 --> BF16, shape = {4096, 14336}
...
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
...

and then the error

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/unsloth/save.py:1212, in save_to_gguf(model_type, model_dtype, is_sentencepiece, model_directory, quantization_method, first_conversion, _run_installer)

RuntimeError: Unsloth: Quantization failed for /afh/projects/test_project-5477c8e6-ac7d-4117-9d2b-0bbd54c12c6a/shared/Users/Riccardo.Rorato/model/unsloth.BF16.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.

Needless to say, I do have cloned and built llama.cpp (also with the updated guide here). I have also tried with previous commits that had previous options, moved the llama-quantize file from llama.cpp/build/bin to the folder of the notebook, but nothing changed.

I am able to get a GGUF file regardless by launching the llama.cpp's quantization script manually like this:

python3 llama.cpp/convert_lora_to_gguf.py my_model

However I cannot find out why unsloth does not recognize it.

Details:


Solution

  • Turns out it is a problem they are currently fixing, adressed in these issues:

    For now the solution is to manually build llama.cpp and run

    python3 llama.cpp/convert_lora_to_gguf.py my_model