pythonspeech-recognitionopenai-apifine-tuningopenai-whisper

How can I finetune a model from OpenAI's Whisper ASR on my own training data?


I use OpenAI's Whisper python lib for speech recognition. I have some training data: either text only, or audio + corresponding transcription. How can I finetune a model from OpenAI's Whisper ASR on my own training data?


Solution

  • From https://github.com/openai/whisper/discussions/64, the released code doesn't contain the training/finetuning part. Therefore one would have to write it to be able to train/finetune a model from OpenAI's Whisper ASR on my own training data.

    Also, from https://openai.com/blog/whisper/:

    We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing.

    No training code mentioned.


    William Castrillon and nizata pointed to the following fine-tuning codes created by third-party developers: