huggingface-transformersllama

meta-llama/Llama-2-7b-hf returning tensor instead of ModelOutput


I am using the Llama-2-7b-hf model with the model.generate function from the transformers library (v4.38.2) and it's returning the outputs as a single tensor, instead of the ModelOutput I expected.

I have a copy of the model stored locally:

[Llama-2-7b-hf]$ ls -1
config.json
generation_config.json
LICENSE.txt
model-00001-of-00002.safetensors
model-00002-of-00002.safetensors
model.safetensors.index.json
README.md
Responsible-Use-Guide.pdf
special_tokens_map.json
tokenizer_config.json
tokenizer.json
tokenizer.model
USE_POLICY.md

This is the code where the model is initialized and then called:

model_path = "Llama-2-7b-hf"
model = AutoModelForCausalLM.from_pretrained(model_path, return_dict_in_generate=True, local_files_only=True).to(device)
tokenizer = AutoTokenizer.from_pretrained(engine, local_files_only=True)

input_ids = tokenizer(model_path, return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids, top_k=1, max_length=max_len, num_return_sequences=1, output_scores=True)
sequences, scores = outputs.sequences, outputs.scores

I have used this code with several other models like Mistral and Occiglot and they return ModelOutput objects with these attributes, sequences and scores, but not Llama. Can anyone help me understand what is wrong?


Solution

  • I managed to solve it by passing the return_dict_in_generate and output_scores parameters in the call to model.generate instead of in the initialization of the model.

    model = AutoModelForCausalLM.from_pretrained(engine, local_files_only=True).to(device)
    outputs = model.generate(input_ids, top_k=1, max_length=max_len, num_return_sequences=1, output_scores=True, return_dict_in_generate=True)