Trainer huggingface - RuntimeError: cannot pin 'torch.cuda.FloatTensor' only dense CPU tensors can be pinned

I recently got the following error: RuntimeError: cannot pin 'torch.cuda.FloatTensor' only dense CPU tensors can be pinned when doing LoRA on a small LLM.

I saw on a discord someone saying:

The issue likely stems from the fact that you are manually placing your inputs on the GPU (with to(model.device)), but the Trainer expects data to be on the CPU and will handle the transfer to the GPU internally.

I can't find anything of the sort written in the Trainer documentation of huggingface https://huggingface.co/docs/transformers/en/main_classes/trainer.

Is it true? If not, how can I get rid of that error?

MRE:

import torch
from torch.utils.data import Dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import TrainingArguments
from transformers import Trainer
from peft import LoraConfig, get_peft_model

model_name = "croissantllm/CroissantLLMBase"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

texts = [
    "The first sentence for fine-tuning. </s>",
    "The second sentence for fine-tuning. </s>"
]

inputs = [tokenizer(text, return_tensors="pt").to(model.device) for text in texts]

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.1,
    target_modules=["q_proj", "v_proj"],
)

model = get_peft_model(model, lora_config)

class CustomDataset(Dataset):
    def __init__(self, input_list):
        self.input_list = input_list

    def __len__(self):
        return len(self.input_list)

    def __getitem__(self, idx):
        input_ids = self.input_list[idx]['input_ids'].squeeze()
        labels = input_ids.clone()
        return {"input_ids": input_ids, "labels": labels}

train_dataset = CustomDataset(inputs)

training_args = TrainingArguments(
    output_dir="./lora_croissantllm",
    per_device_train_batch_size=1,
    num_train_epochs=1,
    save_steps=10,
    save_total_limit=2,
    logging_dir="./logs",
    logging_steps=10,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
)

trainer.train()

The issue is fairly easy to reproduce directly on colab (run %pip install --upgrade torch transformers peft in the first cell).

Solution

Since pinning memory is only available on CPU and not GPU, when running on GPU on Colab, you can just disable it by setting dataloader_pin_memory to False for TrainingArguments

training_args = TrainingArguments(
    output_dir="./lora_croissantllm",
    dataloader_pin_memory=False,
    per_device_train_batch_size=1,
    num_train_epochs=1,
    save_steps=10,
    save_total_limit=2,
    logging_dir="./logs",
    logging_steps=10,
)