When using peft to fine-tune a pretrained model e.g., DistilBert
, you need to specify the target_modules
. In case of DistilBert, typically, the attention weights are targeted. Example:
lora_config = LoraConfig(
r=8, # Rank Number
lora_alpha=32, # Alpha (Scaling Factor)
lora_dropout=0.05, # Dropout Prob for Lora
target_modules=["q_lin", "k_lin","v_lin"], # Which layer to apply LoRA, usually only apply on MultiHead Attention Layer
bias='none',
task_type=TaskType.SEQ_CLS # Seqence to Classification Task
)
my question, when finetuning a pretrained model on a downstream task, you initialize a new layer (like classification layer) which is not pre-trained and have random weights, does peft also freeze this layer or does it optimize it?
When a new task-specific layer, like a classification head, is added to the pretrained model, it is not frozen by default. This layer typically starts with random weights and is trained/optimized along with the LoRA parameters. So basically PEFT focuses on freezing the backbone of the model and it doesn't include any added layers.
You can easily verify the parameters in your model like this:
for name, param in model.named_parameters():
print(name, param.requires_grad)
The classification head will have requires_grad=True