I am executing on an Azure VM this notebook from the Unsloth docs:
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb
Where in the end they save the model to GGUF format after fine-tuning like this:
model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m") # or any other quantization
I get logs
...long list of layer quantization logs like
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> BF16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> BF16, shape = {4096, 14336}
...
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
...
and then the error
File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/unsloth/save.py:1212, in save_to_gguf(model_type, model_dtype, is_sentencepiece, model_directory, quantization_method, first_conversion, _run_installer)
RuntimeError: Unsloth: Quantization failed for /afh/projects/test_project-5477c8e6-ac7d-4117-9d2b-0bbd54c12c6a/shared/Users/Riccardo.Rorato/model/unsloth.BF16.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.
Needless to say, I do have cloned and built llama.cpp
(also with the updated guide here). I have also tried with previous commits that had previous options, moved the llama-quantize
file from llama.cpp/build/bin
to the folder of the notebook, but nothing changed.
I am able to get a GGUF file regardless by launching the llama.cpp's quantization script manually like this:
python3 llama.cpp/convert_lora_to_gguf.py my_model
However I cannot find out why unsloth does not recognize it.
Details:
Turns out it is a problem they are currently fixing, adressed in these issues:
For now the solution is to manually build llama.cpp
and run
python3 llama.cpp/convert_lora_to_gguf.py my_model