large-language-modelhuggingfacehuggingface-tokenizers

Issue Running Taide 8B Locally: Kernel Built for sm80, but My GPU is sm37


I want to use the Taide 8B model locally. Here's how I'm loading it:

from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

login("token")

model_name = "taide/Llama-3.1-TAIDE-LX-8B-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    use_auth_token=True,
    torch_dtype=torch.float16,     
    device_map="auto"             
)

inputs = tokenizer("請簡述台灣的歷史。", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

However, I keep getting the following error: FATAL: kernel fmha_cutlassF_f16_aligned_64x128_rf_sm80 is for sm80-sm100, but was built for sm37

Does anyone have a solution for this?

I'm pretty sure my huggingface login token is correct.


Solution

  • I think you might install wrong version of CUDA. You can try installing CUDA12.8. or higher.