I'm following the Hands-On Large Language Models book to learn more about LLMs. I'm trying to generate text using the "microsoft/Phi-3-mini-4k-instruct" model which is used in the book. Somehow I get an error while trying the example code:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-3-mini-4k-instruct",
#device_map = "cuda",
torch_dtype = "auto",
trust_remote_code = True
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
# create a pipeline
generator = pipeline(
"text-generation",
model = model,
tokenizer = tokenizer,
return_full_text = False,
max_new_tokens = 500,
do_sample = False
)
# The prompt
messages = [
{"role": "user",
"content": "Create a funny joke about chickens."}
]
# Generate output
output = generator(messages)
print(output[0]["generated_text"])
Which returns the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipython-input-1474234034.py in <cell line: 0>()
6
7 # Generate output
----> 8 output = generator(messages)
9 print(output[0]["generated_text"])
8 frames
~/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-mini-4k-instruct/0a67737cc96d2554230f90338b163bc6380a2a85/modeling_phi3.py in prepare_inputs_for_generation(self, input_ids, past_key_values, attention_mask, inputs_embeds, **kwargs)
1289 if isinstance(past_key_values, Cache):
1290 cache_length = past_key_values.get_seq_length()
-> 1291 past_length = past_key_values.seen_tokens
1292 max_cache_length = past_key_values.get_max_length()
1293 else:
AttributeError: 'DynamicCache' object has no attribute 'seen_tokens'
The model is loading, but I don't know why this error is happening.
This issue is being caused by the trust_remote_code=True parameter. Please modify it as follows:
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-3-mini-4k-instruct",
#device_map = "cuda",
torch_dtype = "auto",
trust_remote_code = False # Change to False
)
trust_remote_code=True causes the download of outdated custom code from the Hugging Face Hub. This legacy code uses the past_key_values.seen_tokens attribute, which no longer exists in the current transformers library's DynamicCache class. Since the Phi-3 model is now included in the transformers library, custom code is no longer required. The transformers library code instead uses get_seq_length() in place of seen_tokens.