I'm facing an issue when training a model using PEFT and LoRA on a multi-GPU setup with PyTorch and Hugging Face Transformers. The error I get is:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
Here are the details of my setup and code:
Code:
data = load_dataset(data_path, split="train").map(formatting_prompts_func)
model_name = "yandex/YandexGPT-5-Lite-8B-pretrain"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
model_name, trust_remote_code=True,
padding_side="left",
add_eos_token=True, add_bos_token=True,
use_fast=True
)
tokenizer.pad_token = tokenizer.eos_token
instruction_template = "### PROMPT:"
response_template = "### OUTPUT:"
collator = SafeCollator(
instruction_template=instruction_template,
response_template=response_template,
tokenizer=tokenizer, mlm=False
)
peft_config = LoraConfig(...)
training_args = SFTConfig(...)
trainer = SFTTrainer(model,
peft_config=peft_config,
train_dataset=data,
data_collator=collator,
args=training_args
)
trainer.train()
Dataset:
Dataset({
features: ['instruction', 'output', 'retrieved_context', 'text'],
num_rows: 7317
})
Details
I'm using Kaggle's 2xT4 configuration. My model cant fit only one GPU's memory
Using older transformers version helped me
pip install transformers==4.49.0