[SOLVED] AssertionError when using llama-cpp-python in Google Colab

AssertionError when using llama-cpp-python in Google Colab

I'm trying to use llama-cpp-python (a Python wrapper around llama.cpp) to do inference using the Llama LLM in Google Colab. My code looks like this:

!pip install llama-cpp-python
from llama_cpp import ChatCompletionMessage, Llama

model = Llama(
    "/content/drive/MyDrive/<weights-file>.bin",
)

However, when running it, I get this error:

AssertionError                            Traceback (most recent call last)
<ipython-input-13-652eb650093d> in <cell line: 9>()
      7 }
      8 
----> 9 model = Llama(
     10     model_path="/content/drive/MyDrive/careo/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin",
     11 )

/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py in __init__(self, model_path, n_ctx, n_parts, n_gpu_layers, seed, f16_kv, logits_all, vocab_only, use_mmap, use_mlock, embedding, n_threads, n_batch, last_n_tokens_size, lora_base, lora_path, low_vram, tensor_split, rope_freq_base, rope_freq_scale, n_gqa, rms_norm_eps, mul_mat_q, verbose)
    321                     self.model_path.encode("utf-8"), self.params
    322                 )
--> 323         assert self.model is not None
    324 
    325         if verbose:

AssertionError:

I have tried running this code on my local machine and it works without problem. Do you have any idea what might be causing the error in Google Colab?

Solution

From the model path - model_path="/content/drive/MyDrive/careo/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin", I can see you are using ggmlv3 model format, as per the new commit for llama-cpp-python repo, the new model format has been changed from ggmlv3 to gguf.

The author also mentioned that ggmlv3 weights are still gonna work for versions before 0.1.79(new version) so you either mention the version while installing the package pip install llama-cpp-python==0.1.78 or change model format for new version gguf - refer.

If CodeLlama model weights are useful for you, then there are so many model weights published in hugging face like TheBloke/CodeLlama-13B-GGUF