I am just using huggingface example to use their LLM model, but it stuck at the:
downloading shards: 0%| | 0/5 [00:00<?, ?it/s]
(I am using Jupiter notebook, python 3.11
, and all requirements were installed)
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model = "tiiuae/falcon-40b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
sequences = pipeline(
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
how can I fix it?
I think it's not stuck.
These are just very large models that take a while to download. tqdm
only estimates after the first iteration, so it just looks like nothing is happening. I'm currently downloading the smallest version of LLama2 (7B parameters) and it's downloading two shards. The first took over 17 minutes to complete and I have reasonably fast internet connection.