My implementation for the AutoModel AutoTokenizer classes are fairly simple:
from transformers import AutoModel, AutoTokenizer
import numpy as np
from rank_bm25 import BM25Okapi
from sklearn.neighbors import NearestNeighbors
class EmbeddingModels:
def bert(self, model_name, text):
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1).detach().numpy()
return embeddings
def create_chunks(self, text, chunk_size):
return [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
But I can't get this warning to go away:
A parameter name that contains 'beta' will be renamed internally to 'bias'.
Please use a different name to suppress this warning.
A parameter name that contains 'gamma' will be renamed internally to 'weight'.
Please use a different name to suppress this warning.
There is no reference to the word beta or gamma anywhere in my repo.
Updating the package, suppressing the warnings with import warnings
Before loading from pretrained model set transformers logger level to error as shown below. It sure is really frustrating not being able to leverage the warnings
library filter
loggers = [logging.getLogger(name) for name in logging.root.manager.loggerDict]
for logger in loggers:
if "transformers" in logger.name.lower():
logger.setLevel(logging.ERROR)
# now you can load state dict from pretrained
model = transformers.BertModel.from_pretrained(
"bert-base-uncased",
use_safetensors=True,
return_dict=False,
attn_implementation="sdpa",
)