I am experimenting with Llama-3.2-1B-Instruct for learning purposes. When I try to implement a simple re-write task with Hugging Face transformers
, I get a weird result when the model does not generate a stopping token so the response goes on until the max_new_tokens
is exhausted. When I use the same prompt with the same model locally with Ollama or in the Hugging Face Playground, I get expected model responses.
What is going on? My eventual goal is to fine-tune this model to mimic the style of a particular person (style-transfer), but first I need to understand how to implement simple re-write tasks with the base model.
Here is my code:
# Import libraries
from transformers import AutoTokenizer, AutoModelForCausalLM
# Specify model
model_name = "meta-llama/Llama-3.2-1B-Instruct"
# Set up tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
# Load model
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
# Prepare a prompt for email re-write task
original_text = "Hi guys, just checking in to see if you finally finished the slides for next week when we meet with Jack. Let me know asap. Cheers, John"
prompt = f"""
### Instruction:
Revise the following draft email so that it reads in a professional voice, preserving meaning but improving clarity, structure, and tone. Only provide the revised email and nothing else.
### Draft:
{original_text}
### Revision:
"""
# Generate LLM output
input = tokenizer(prompt, return_tensors='pt')
output = model.generate(**input, max_new_tokens=200)
result = tokenizer.decode(output[0], skip_special_tokens=True)
print(result)
I was expecting that after the "Revision:" heading I will see a re-write of the draft email in a formal tone. Instead, I got the output below. As you can see, the generation ran into some sort of loop, used up all the allowed 200 tokens, and stopped abruptly.
Instruction:
Revise the following draft email so that it reads in a professional voice, preserving meaning but improving clarity, structure, and tone. Only provide the revised email and nothing else.
Draft:
Hi guys, just checking in to see if you finally finished the slides for next week when we meet with Jack. Let me know asap. Cheers, John
Revision:
Subject: Upcoming Meeting - Slide Submission
Dear Team,
I wanted to touch base with you regarding the upcoming meeting with Jack. Could you please confirm whether you have completed the presentation slides for next week's meeting? If so, please let me know as soon as possible so I can finalize the details.
Best regards, John
Revised:
Subject: Upcoming Meeting - Slide Submission
Dear Team,
I am writing to confirm that you have completed the presentation slides for our upcoming meeting with Jack. If you have, please let me know as soon as possible so I can finalize the details.
Best regards, John
Revised:
Subject: Upcoming Meeting - Slide Submission
Dear Team,
I would like to confirm that the presentation slides for our meeting with Jack have been completed. If so, please let me know as soon as possible so I can proceed with the next steps.
Best regards, John
Revised:
Subject: Upcoming Meeting - Slide Submission
Dear Team,
I would appreciate
The problem was solved by wrapping the prompt with the chat template that Llama models use during instruction tuning. Adding the special tokens to the prompt better steered the model in the right direction.
Here is a code block that demonstrates what worked:
# Prepare a prompt for email re-write task
original_text = "Hi guys, just checking in to see if you finally finished the slides for next week when we meet with Jack. Let me know asap. Cheers, John"
messages = [
{"role": "system", "content": "You are an AI assistant that revises emails in a professional writing style."},
{"role": "user", "content": f"Revise the following draft email in a professional voice, preserving meaning. Only provide the revised email.\n\n### Draft:\n{original_text}"}
]
# Apply the chat template (adds special tokens like <|start_header_id|>, etc.)
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False, # We want the string, not tokens yet
add_generation_prompt=True # Ensures the prompt ends expecting the assistant's turn
)
print("--- Formatted Prompt ---")
print(prompt)
print("------------------------")