I wanna set my eos_token_id, and pad_token_id. I googled alot, and most are suggesting to use e.g. tokenizer.pad_token_id (like from here https://huggingface.co/meta-llama/Meta-Llama-3-8B/discussions/36). But the problem is my code doesn't have tokenizer initiation.
I checked the official Llama3 page https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/, it does not show the code.
While my code is like this:
import os
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface import HuggingFaceLLM
import torch
# Define the LLM
llm = HuggingFaceLLM(
context_window=4096,
max_new_tokens=256, # Reduce max new tokens for faster inference
generate_kwargs={
"temperature": 0.1,
"do_sample": True,
"pad_token_id": 128001 ,
"eos_token_id": 128001
},
tokenizer_name="meta-llama/Meta-Llama-3-8B-Instruct",
model_name="meta-llama/Meta-Llama-3-8B-Instruct",
device_map="auto",
model_kwargs={"torch_dtype": torch.float16}
)
So my question is, what should be the proper setting for the pad and eos token id? I am sure it is not 128001. Would anyone please help?
For the eos_token that was working for me:
"eos_token_id": [128001, 128009]
Found here at the bottom: https://github.com/vllm-project/vllm/issues/4180
For the pad_token, I guess you can ignore it like suggested here: https://github.com/meta-llama/llama3/issues/42