nlphuggingface-transformerstransformer-modelsummarizationbeam-search

HuggingFace Summarization: effect of specifying both `do_sample` and `num_beams`


I am using a HuggingFace summarization pipeline to generate summaries using a fine-tuned model. The summarizer object is initialised as follows:

from transformers import pipeline

summarizer = pipeline(
    "summarization", 
    model=model, 
    tokenizer=tokenizer, 
    num_beams=5, 
    do_sample=True, 
    no_repeat_ngram_size=3,
    max_length=1024,
    device=0,
    batch_size=8
)

According to the documentation, setting num_beams=5 means that the top 5 choices are retained when a new token in the sequence is generated based on a language model, and the model moves forward discarding all other possibilities, and repeating this after every new token is generated. However, this option seems to be apparently incompatible with do_sample=True which seems to activate a behaviour where new tokens are picked based on some random strategy (which doesn't have to be uniformly random of course, but I don't know the details of this process). Could anyone explain clearly how num_beams=5 and do_sample=True would work together (no error is raised so I assume this is a valid summarizer configuration)?


Solution

  • First difference is that temperature is applied to the logits.

    The second difference is that instead of taking the top token of the beam per beam, the choice of the beam is sampled from the distribution of that beam:

    https://github.com/huggingface/transformers/blob/main/src/transformers/generation_utils.py#L2626

    I believe the rest stays the same, but you can continue to read the code to be 100% sure