Preface
I am new to implementing the NLP model. I have successfully fine-tuned LLaMA 3-8B variants with QLORA and uploaded them to HuggingFace.
The directories are filled with these files:
- .gitattributes
- adapter_config.json
- adapter_model.safetensors
- special_tokens_map.json
- tokenizer.json
- tokenizer_config.json
- training_args.bin
Implementation
model_id_1 = "ferguso/llama-8b-pcl-v3"
tokenizer_1 = AutoTokenizer.from_pretrained(model_id_1)
quantization_config = BitsAndBytesConfig(
load_in_8bit=True,
)
model_1 = AutoModelForCausalLM.from_pretrained(
model_id_1,
quantization_config=quantization_config,
)
But it shows the error OSError: ferguso/llama-8b-pcl-v3 does not appear to have a file named config.json. Checkout 'https://huggingface.co/ferguso/llama-8b-pcl-v3/tree/main' for available files.
meta-llama/Meta-Llama-3-8B
:original_model = "meta-llama/Meta-Llama-3-8B"
model_id_1 = "ferguso/llama-8b-pcl-v3"
tokenizer_1 = AutoTokenizer.from_pretrained(model_id_1)
quantization_config = BitsAndBytesConfig(
load_in_8bit=True,
)
original_config = AutoConfig.from_pretrained(original_model)
original_config.save_pretrained(model_id_1)
model_1 = AutoModelForCausalLM.from_pretrained(
model_id_1,
quantization_config=quantization_config,
config = original_config
)
But still, it shows another error OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory ferguso/llama-8b-pcl-v3.
Questions
How to load the fine-tuned model properly?
Your directory contains only the files of the peft-adapter and the files required to load the tokenizer, but the base model weights are missing. I assume you have used the save_pretrained method from peft. This method only saves the adapter weights and config (I use a smaller model for my answer and a different task type!):
from peft import LoraConfig, TaskType, get_peft_model, PeftModel
from transformers import AutoModelForTokenClassification
from pathlib import Path
# ferguso/llama-8b-pcl-v3 in your case
adapter_path = 'bla'
# meta-llama/Meta-Llama-3-8B in your case
base_model_id = "distilbert/distilbert-base-uncased"
peft_config = LoraConfig(task_type=TaskType.TOKEN_CLS, target_modules="all-linear")
# AutoModelForCausalLM in your case
model = AutoModelForTokenClassification.from_pretrained(base_model_id)
model = get_peft_model(model, peft_config)
model.save_pretrained(adapter_path)
print(*list(Path(adapter_path).iterdir()), sep='\n')
Output:
bla/adapter_config.json
bla/README.md
bla/adapter_model.safetensors
To load your pretrained model successfully, you need to load this base_model weights as well and use the peft model class to load the adapter:
model = AutoModelForTokenClassification.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(model, adapter_path)
You can also merge the adapter weights back with merge_and_unload and save it:
model.merge_and_unload().save_pretrained('bla2')
print(*list(Path('bla2').iterdir()), sep='\n')
Output:
bla2/config.json
bla2/model.safetensors
This way you will be able to load the model without peft and only transformers
as you tried in the example code of your question.