I am trying to follow this article on medium Article.
I had a few problems with it so the remain chang eI did was to the TrainingArguments
object I added gradient_checkpointing_kwargs={'use_reentrant':False},
.
So now I have the following objects:
peft_training_args = TrainingArguments(
output_dir = output_dir,
warmup_steps=1,
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
max_steps=100, #1000
learning_rate=2e-4,
optim="paged_adamw_8bit",
logging_steps=25,
logging_dir="./logs",
save_strategy="steps",
save_steps=25,
evaluation_strategy="steps",
eval_steps=25,
do_eval=True,
gradient_checkpointing=True,
gradient_checkpointing_kwargs={'use_reentrant':False},
report_to="none",
overwrite_output_dir = 'True',
group_by_length=True,
)
peft_model.config.use_cache = False
peft_trainer = transformers.Trainer(
model=peft_model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
args=peft_training_args,
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
And when I call peft_trainer.train()
I get the following error:
AttributeError: 'torch.dtype' object has no attribute 'itemsize'
I'm using Databricks, and my pytorch version is 2.0.1+cu118
I was able to recreate your problem on Databricks with the following cluster:
And then building on top of all the answers here already I was able to overcome your problem by the following:
!pip install -–upgrade git+https://github.com/huggingface/transformers
!pip install -–upgrade torch torchvision
!pip install -–upgrade accelerate
!pip install datasets==2.16.0
I'm not sure if it matters but the order I used of the commands above are:
4 >> 1 >> 3 >> 2
This makes your problem go away and works on both transformers.Trainer
and also SFTTrainer
that I saw in your article imported but never used.