pythonhuggingface-transformerstransformer-model

How to make huggingface transformer for translation return n translation inferences?


So I am trying to use this transformer from huggingface https://huggingface.co/docs/transformers/en/tasks/translation. The issue is that I want n translations returned and not just one. How can I do that? I mean, I want to have ordered translations, that means the translation with index 0 would have the highest confidence, this is important for my use case, which is about translating natural language to commands language (about 40 commands without subcommands).

The github repo and exact model is this one https://github.com/google-research/text-to-text-transfer-transformer/blob/main/t5/models/hf_model.py

This is the HuggingFace API:

translator = pipeline("translation_xx_to_yy", model="my_awesome_opus_books_model")
translator(text)

But I am intending to use the model directly from the google search github repo, so it seems some tweaking should be done here:

predictions = []
    for batch in dataset:
      predicted_tokens = self._model.generate(
          input_ids=self.to_tensor(batch["inputs"]), **generate_kwargs
      )
      predicted_tokens = predicted_tokens.cpu().numpy().tolist()
      predictions.extend(
          [vocabs["targets"].decode(p) for p in predicted_tokens]
      )

    for inp, pred in zip(inputs, predictions):
      logging.info("%s\n  -> %s", inp, pred)

    if output_file is not None:
      utils.write_lines_to_file(predictions, output_file)

Also any suggestion on some other model option to solve this natural language to cmd is welcomed!


Solution

  • Check out the documentation of the generate method: https://huggingface.co/docs/transformers/generation_strategies#customize-text-generation

    The parameter to use is num_return_sequences. But T5 by default does a greedy search, meaning it generates word by word and discards the options on its path there. To generate multiple options you need a selection of alternative paths. There are basically two ways to do this (my guess would be that for your case the first option works better):

    If you activate do_sample, the model will not just pick the highest probability token at each time, but instead take a weighted sample from the distribution of next word probabilities.

    predicted_tokens = self._model.generate(
              input_ids=self.to_tensor(batch["inputs"]), num_return_sequences=3, do_sample=True, **generate_kwargs
          )
    

    If you set num_beams to anything larger than 1, you switch to beam search, where for each further token the model follows multiple alternatives of next tokens.

    predicted_tokens = self._model.generate(
              input_ids=self.to_tensor(batch["inputs"]), num_return_sequences=3, num_beams=4, **generate_kwargs
          )  # note that num_return_sequences has to be smaller or equal to num_beans
    

    To also get the scores of generated outputs you can additionally use the arguments output_scores=True and return_dict_in_generate=True, although you should note that these will return the logits of all individual tokens, which you then would have to put together to the overall probability yourself, check out https://stackoverflow.com/a/75029986/18189622.

    In general, T5 might not be the best model for code synthesis, as of my knowledge it wasn't pretrained or fine-tuned on it in its Multi-Task Instruction Fine-Tuning. There is however FLAN-T5, which was fine-tuned on a wider range of tasks, including Code Synthesis. There are also CodeT5 and many other models relating to code synthesis.