Example:
# pip install transformers
from transformers import AutoModelForTokenClassification, AutoTokenizer
# Load model
model_path = 'huawei-noah/TinyBERT_General_4L_312D'
model = AutoModelForTokenClassification.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Convert the model to FP16
model.half()
vs.
model.to(dtype=torch.float16)
What is the difference, if any, between model.half() and model.to(dtype=torch.float16) in huggingface-transformers?
Both model.half()
and model.to(dtype=torch.float16)
are methods used to convert the model's parameters to FP16. Used as such, there is no difference.