I want to use the Taide 8B model locally. Here's how I'm loading it:
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
login("token")
model_name = "taide/Llama-3.1-TAIDE-LX-8B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
use_auth_token=True,
torch_dtype=torch.float16,
device_map="auto"
)
inputs = tokenizer("請簡述台灣的歷史。", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
However, I keep getting the following error:
FATAL: kernel fmha_cutlassF_f16_aligned_64x128_rf_sm80
is for sm80-sm100, but was built for sm37
Does anyone have a solution for this?
I'm pretty sure my huggingface login token is correct.
I think you might install wrong version of CUDA. You can try installing CUDA12.8. or higher.