I wanted to use GPT2Tokenizer, AutoModelForCausalLM for generating (rewriting) sample text. I have tried transformers==4.10.0
, transformers==4.30.2
and --upgrade git+https://github.com/huggingface/transformers.git
, however I get the error of AttributeError: 'GPT2LMHeadModel' object has no attribute 'compute_transition_scores
.
My code is as follows:
from transformers import GPT2Tokenizer, AutoModelForCausalLM
import numpy as np
import pandas as pd
x = "sample Text" #df_toxic['text'].iloc[0]
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer.pad_token_id = tokenizer.eos_token_id
inputs = tokenizer(x, return_tensors="pt")
# Example 1: Print the scores for each token generated with Greedy Search
outputs = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=True, output_scores=True)
transition_scores = model.compute_transition_scores(
outputs.sequences, outputs.scores, normalize_logits=True
)
# input_length is the length of the input prompt for decoder-only models, like the GPT family, and 1 for
# encoder-decoder models, like BART or T5.
input_length = 1 if model.config.is_encoder_decoder else inputs.input_ids.shape[1]
generated_tokens = outputs.sequences[:, input_length:]
for tok, score in zip(generated_tokens[0], transition_scores[0]):
# | token | token string | logits | probability
print(f"| {tok:5d} | {tokenizer.decode(tok):8s} | {score.numpy():.3f} | {np.exp(score.numpy()):.2%}")
I got the error of:
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In [21], line 3
1 # Example 1: Print the scores for each token generated with Greedy Search
2 outputs = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=True, output_scores=True)
----> 3 transition_scores = model.compute_transition_scores(
4 outputs.sequences, outputs.scores, normalize_logits=True
5 )
6 # # input_length is the length of the input prompt for decoder-only models, like the GPT family, and 1 for
7 # # encoder-decoder models, like BART or T5.
8 # input_length = 1 if model.config.is_encoder_decoder else inputs.input_ids.shape[1]
(...)
11 # # | token | token string | logits | probability
12 # print(f"| {tok:5d} | {tokenizer.decode(tok):8s} | {score.numpy():.3f} | {np.exp(score.numpy()):.2%}")
File /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1207, in Module.__getattr__(self, name)
1205 if name in modules:
1206 return modules[name]
-> 1207 raise AttributeError("'{}' object has no attribute '{}'".format(
1208 type(self).__name__, name))
AttributeError: 'GPT2LMHeadModel' object has no attribute 'compute_transition_scores'
To generate text using transformers
and GPT2 model, if you're not particular about modifying different generation features you can use the pipeline function, e.g.
from transformers import pipeline
generator = pipeline('text-generation', model='gpt2')
generator("Hello world, continue... ")
[out]:
[{'generated_text': 'Hello world, continue... !! A group of two great people from Finland came to my office and brought me with them, and I got some beautiful drawings with the colours. I thought I gave it to the artist but that was not the case.'}]
If you have somehow have to use GPT2Tokenizer
and AutoModelForCausalLM
instead of using pipeline
, you can try AutoTokenizer
instead of GPT2Tokenizer
, e.g.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer.pad_token_id = tokenizer.eos_token_id
x = "Hello world, ..."
inputs = tokenizer(x, return_tensors="pt")
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer.pad_token_id = tokenizer.eos_token_id
x = "Hello world, ..."
inputs = tokenizer(x, return_tensors="pt")
model_outputs = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=True, output_scores=True)
generated_tokens_ids = model_outputs.sequences[0]
tokenizer.decode(generated_tokens_ids)
[out]:
Hello world,...\n\nI'm sorry
To use the compute_transition_scores
function implemented in https://discuss.huggingface.co/t/announcement-generation-get-probabilities-for-generated-output/30075/24
First make sure you really have the update version of transformers by doing:
import transformers
print(transformers.__version__)
If the version is after the feature have been implemented, this should give no error:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer.pad_token_id = tokenizer.eos_token_id
model.compute_transition_scores
[out]:
<bound method GenerationMixin.compute_transition_scores of GPT2LMHeadModel(...)
If you see the AttributeError
,
AttributeError: 'GPT2LMHeadModel' object has no attribute 'compute_transition_scores'
most probably your current Python kernel (maybe inside Jupyter) isn't the right one that you have with your pip
. If so, check your executable:
import sys
sys.executable
Then you should see something like:
/usr/bin/python3
After that, instead of simple pip install -U transformers
reuse that above python binary and do:
/usr/bin/python3 -m pip install -U transformers
See also: