I trained a LongT5 model using Huggingface's tooling.
When I use the trained model directly after training the inference works as expected, I get good quality output, as expected from the training metrics. However if I save the model and load it from the disk, the output is gibberish. I can't figure out why.
Code producing good output:
text = dataset['test'][0]['from']
inputs = tokenizer(text, return_tensors="pt").input_ids
inputs = inputs.to('cuda:0')
model.eval()
with torch.no_grad():
model.to('cuda:0')
model.generation_config = generation_config
outputs = model.generate(inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Prints correct output
print(translation)
How I save the model:
trainer.save_model(os.path.join(model_output_dir, "final"))
tokenizer.save_pretrained(os.path.join(model_output_dir, "final"))
How I load the model:
model = LongT5ForConditionalGeneration.from_pretrained(os.path.join(model_output_dir, "final"))
model.to('cuda:0')
model.generation_config = generation_config
outputs = model.generate(inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Prints random garbage, like:
# pamper verre195188 albums188 albums188188 albums188 albums188 albums188 albums; unterschiedlich188 albums188 albums188 albums ...
print(translation)
In both cases, the tokenizer is the instance that already exists in memory during the training, but it doesn't make a difference whether I load it from the disk or not -- same result.
The generation_config
variable looks like this and it's also set in the training arguments:
generation_config = GenerationConfig.from_model_config(model.config)
generation_config._from_model_config = False
generation_config.max_new_tokens = 512
It makes no difference whether it's set in the inference code or not, I still get gibberish.
So, it turns out some strange bug in the current stable version of safetensors. It doesn't save the encoder.embed_tokens.weight
and decoder.embed_tokens.weight
state, so when the model is loaded again, these layers are initialized with random numbers.
There are two workarounds:
!pip install -U git+https://github.com/huggingface/safetensors.git
save_safetensors=False
in the training arguments, so that HF will use pickle to save your model instead of safetensors.