[SOLVED] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! during training on multi-GPU setup

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! during training on multi-GPU setup

I'm facing an issue when training a model using PEFT and LoRA on a multi-GPU setup with PyTorch and Hugging Face Transformers. The error I get is:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

Here are the details of my setup and code:

Code:

data = load_dataset(data_path, split="train").map(formatting_prompts_func)

model_name = "yandex/YandexGPT-5-Lite-8B-pretrain"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
    model_name, trust_remote_code=True,
    padding_side="left",
    add_eos_token=True, add_bos_token=True,
    use_fast=True
)
tokenizer.pad_token = tokenizer.eos_token

instruction_template = "### PROMPT:"
response_template = "### OUTPUT:"

collator = SafeCollator(
    instruction_template=instruction_template,
    response_template=response_template,
    tokenizer=tokenizer, mlm=False
)

peft_config = LoraConfig(...)

training_args = SFTConfig(...)

trainer = SFTTrainer(model,
                    peft_config=peft_config,
                    train_dataset=data,
                    data_collator=collator,
                    args=training_args
)
trainer.train()

Dataset:

Dataset({
    features: ['instruction', 'output', 'retrieved_context', 'text'],
    num_rows: 7317
})

Details

I'm using Kaggle's 2xT4 configuration. My model cant fit only one GPU's memory

Solution

Using older transformers version helped me

pip install transformers==4.49.0