huggingface-transformers huggingface huggingface-trainer

HuggingFace model loaded from the disk generates gibberish

I trained a LongT5 model using Huggingface's tooling.

When I use the trained model directly after training the inference works as expected, I get good quality output, as expected from the training metrics. However if I save the model and load it from the disk, the output is gibberish. I can't figure out why.

Code producing good output:

text = dataset['test'][0]['from']
inputs = tokenizer(text, return_tensors="pt").input_ids
inputs = inputs.to('cuda:0')

model.eval()

with torch.no_grad():
    model.to('cuda:0')
    model.generation_config = generation_config
    outputs = model.generate(inputs)

translation = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Prints correct output
print(translation)

How I save the model:

trainer.save_model(os.path.join(model_output_dir, "final"))
tokenizer.save_pretrained(os.path.join(model_output_dir, "final"))

How I load the model:

model = LongT5ForConditionalGeneration.from_pretrained(os.path.join(model_output_dir, "final"))
model.to('cuda:0')
model.generation_config = generation_config

outputs = model.generate(inputs)

translation = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Prints random garbage, like:
# pamper verre195188 albums188 albums188188 albums188 albums188 albums188 albums; unterschiedlich188 albums188 albums188 albums ...

print(translation)

In both cases, the tokenizer is the instance that already exists in memory during the training, but it doesn't make a difference whether I load it from the disk or not -- same result.

The generation_config variable looks like this and it's also set in the training arguments:

generation_config = GenerationConfig.from_model_config(model.config)
generation_config._from_model_config = False
generation_config.max_new_tokens = 512

It makes no difference whether it's set in the inference code or not, I still get gibberish.

Solution

So, it turns out some strange bug in the current stable version of safetensors. It doesn't save the encoder.embed_tokens.weight and decoder.embed_tokens.weight state, so when the model is loaded again, these layers are initialized with random numbers.

There are two workarounds:

Use the latest version of safetensors where this seems to be fixed:

!pip install -U git+https://github.com/huggingface/safetensors.git

Don't use safetensors to save your model at all. You can set save_safetensors=False in the training arguments, so that HF will use pickle to save your model instead of safetensors.