I'm trying to use llama-cpp-python (a Python wrapper around llama.cpp) to do inference using the Llama LLM in Google Colab. My code looks like this:
!pip install llama-cpp-python
from llama_cpp import ChatCompletionMessage, Llama
model = Llama(
"/content/drive/MyDrive/<weights-file>.bin",
)
However, when running it, I get this error:
AssertionError Traceback (most recent call last)
<ipython-input-13-652eb650093d> in <cell line: 9>()
7 }
8
----> 9 model = Llama(
10 model_path="/content/drive/MyDrive/careo/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin",
11 )
/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py in __init__(self, model_path, n_ctx, n_parts, n_gpu_layers, seed, f16_kv, logits_all, vocab_only, use_mmap, use_mlock, embedding, n_threads, n_batch, last_n_tokens_size, lora_base, lora_path, low_vram, tensor_split, rope_freq_base, rope_freq_scale, n_gqa, rms_norm_eps, mul_mat_q, verbose)
321 self.model_path.encode("utf-8"), self.params
322 )
--> 323 assert self.model is not None
324
325 if verbose:
AssertionError:
I have tried running this code on my local machine and it works without problem. Do you have any idea what might be causing the error in Google Colab?
From the model path - model_path="/content/drive/MyDrive/careo/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin"
, I can see you are using ggmlv3 model format, as per the new commit for llama-cpp-python repo, the new model format has been changed from ggmlv3 to gguf.
The author also mentioned that ggmlv3 weights are still gonna work for versions before 0.1.79(new version)
so you either mention the version while installing the package pip install llama-cpp-python==0.1.78
or change model format for new version gguf - refer.
If CodeLlama model weights are useful for you, then there are so many model weights published in hugging face like TheBloke/CodeLlama-13B-GGUF