pythonhuggingface-transformershuggingfacequantizationhalf-precision-float

What is the difference, if any, between model.half() and model.to(dtype=torch.float16) in huggingface-transformers?


Example:

# pip install transformers
from transformers import AutoModelForTokenClassification, AutoTokenizer

# Load model
model_path = 'huawei-noah/TinyBERT_General_4L_312D'
model = AutoModelForTokenClassification.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Convert the model to FP16
model.half()

vs.

model.to(dtype=torch.float16)

What is the difference, if any, between model.half() and model.to(dtype=torch.float16) in huggingface-transformers?


Solution

  • Both model.half() and model.to(dtype=torch.float16) are methods used to convert the model's parameters to FP16. Used as such, there is no difference.