I am using the Llama-2-7b-hf model with the model.generate
function from the transformers library (v4.38.2) and it's returning the outputs as a single tensor, instead of the ModelOutput I expected.
I have a copy of the model stored locally:
[Llama-2-7b-hf]$ ls -1
config.json
generation_config.json
LICENSE.txt
model-00001-of-00002.safetensors
model-00002-of-00002.safetensors
model.safetensors.index.json
README.md
Responsible-Use-Guide.pdf
special_tokens_map.json
tokenizer_config.json
tokenizer.json
tokenizer.model
USE_POLICY.md
This is the code where the model is initialized and then called:
model_path = "Llama-2-7b-hf"
model = AutoModelForCausalLM.from_pretrained(model_path, return_dict_in_generate=True, local_files_only=True).to(device)
tokenizer = AutoTokenizer.from_pretrained(engine, local_files_only=True)
input_ids = tokenizer(model_path, return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids, top_k=1, max_length=max_len, num_return_sequences=1, output_scores=True)
sequences, scores = outputs.sequences, outputs.scores
I have used this code with several other models like Mistral and Occiglot and they return ModelOutput objects with these attributes, sequences
and scores
, but not Llama. Can anyone help me understand what is wrong?
I managed to solve it by passing the return_dict_in_generate
and output_scores
parameters in the call to model.generate
instead of in the initialization of the model.
model = AutoModelForCausalLM.from_pretrained(engine, local_files_only=True).to(device)
outputs = model.generate(input_ids, top_k=1, max_length=max_len, num_return_sequences=1, output_scores=True, return_dict_in_generate=True)