I have a custom classification model trained using transformers
library based on a BERT model. The model classifies text into 7 different categories. It is persisted in a directory using:
trainer.save_model(model_name)
tokenizer.save_pretrained(model_name)
I'm trying to load such persisted model using the allennlp
library for further analysis. I managed to do so after a lot of work. However, when running the model inside the allennlp
framework, the model tends to predict very different from the predictions I get when I run it using transformers
, which lead me think some part of the loading was not done correctly. There are no errors during the inference, it is just that the predictions don't match.
There is little documentation about how to load an existing model, so I'm wondering if someone faced the same situation before. There is just one example of how to do QA classification with ROBERTA, but couldn't extrapolate to what I'm looking for. Anyone have an idea if the steps are following are correct?
This is how I'm loading the trained model:
transformer_vocab = Vocabulary.from_pretrained_transformer(model_name)
transformer_tokenizer = PretrainedTransformerTokenizer(model_name)
transformer_encoder = BertPooler(model_name)
params = Params(
{
"token_embedders": {
"tokens": {
"type": "pretrained_transformer",
"model_name": model_name,
}
}
}
)
token_embedder = BasicTextFieldEmbedder.from_params(vocab=vocab, params=params)
token_indexer = PretrainedTransformerIndexer(model_name)
transformer_model = BasicClassifier(vocab=transformer_vocab,
text_field_embedder=token_embedder,
seq2vec_encoder=transformer_encoder,
dropout=0.1,
num_labels=7)
I also had to implement my own DatasetReader
as follows:
class ClassificationTransformerReader(DatasetReader):
def __init__(
self,
tokenizer: Tokenizer,
token_indexer: TokenIndexer,
max_tokens: int,
**kwargs
):
super().__init__(**kwargs)
self.tokenizer = tokenizer
self.token_indexers: Dict[str, TokenIndexer] = { "tokens": token_indexer }
self.max_tokens = max_tokens
self.vocab = vocab
def text_to_instance(self, text: str, label: str = None) -> Instance:
tokens = self.tokenizer.tokenize(text)
if self.max_tokens:
tokens = tokens[: self.max_tokens]
inputs = TextField(tokens, self.token_indexers)
fields: Dict[str, Field] = { "tokens": inputs }
if label:
fields["label"] = LabelField(label)
return Instance(fields)
It is instantiated as follows:
dataset_reader = ClassificationTransformerReader(tokenizer=transformer_tokenizer,
token_indexer=token_indexer,
max_tokens=400)
To run the model and test out if it works I'm doing the following:
instance = dataset_reader.text_to_instance("some sample text here")
dataset = Batch([instance])
dataset.index_instances(transformer_vocab)
model_input = util.move_to_device(dataset.as_tensor_dict(),
transformer_model._get_prediction_device())
outputs = transformer_model.make_output_human_readable(transformer_model(**model_input))
This works and returns the probabilities correctly, but there don't match what I would get running the model using transformers directly. Any idea what's going on?
Answering the original question, the code above loaded most of the components from the original transformer
model, but the classifier layer. As Dirk mentioned, it is randomly initialized.
The solution is to load the weights of the classifier from transformers
into the AllenNLP
one. The following code does the trick.
from transformers import BertForSequenceClassification
model = BasicClassifier(vocab=transformer_vocab,
text_field_embedder=token_embedder,
seq2vec_encoder=transformer_encoder,
dropout=0.1,
num_labels=7)
# Original model loaded using transformers library
classifier = BertForSequenceClassification.from_pretrained(model_name)
transformer_model._classification_layer.weight = classifier.classifier.weight
transformer_model._classification_layer.bias = classifier.classifier.bias