To support decoding 'mp3' audio files, please install 'sox'

I'm trying to work on an ASR model using transfer learning on wav2vec 2 model. Anyway when I ever I wan't to show or modifiy an audio file I get this problem

def prepare_dataset(batch):
    audio = batch["audio"]

    # batched output is "un-batched"
    batch["input_values"] = processor(audio["array"], sampling_rate=audio["sampling_rate"]).input_values[0]
    batch["input_length"] = len(batch["input_values"])
    
    with processor.as_target_processor():
        batch["labels"] = processor(batch["sentence"]).input_ids
    return batch

common_voice_train = common_voice_train.map(prepare_dataset, remove_columns=common_voice_train.column_names)
common_voice_test = common_voice_test.map(prepare_dataset, remove_columns=common_voice_test.column_names)

The erorrs:

RuntimeError: Backend "sox_io" is not one of available backends: ['soundfile']. ImportError: To support decoding 'mp3' audio files, please install 'sox'.

This is my pytorch and torchaudio versions:

import torch
import torchaudio

print(torch.__version__)
print(torchaudio.__version__)

1.13.1+cu117
0.13.1+cu117

I really need help fixing this problem, this is part of my junior project! )':

I've trying to installing pytorch and installing deffrent versions but nothing worked the code is working. fine in colab but it's impossible for me to train it there so I have to use visual code...

Solution

TorchAudio v2.1- (Added on 2023 September)

In TorchAudio v2.1, the sox binding is switched to dynamic. Meaning that users need to install libsox separately somehow, and one way is pip install sox.

Before TorchAudio v2.1 (the original answer)

First, note that the second error message is not from torchaudio and it's not accurate. TorchAudio does not depend on an external sox package.

TorchAudio provides limited IO features on Windows, as libsox does not compile on Windows with VS2019. This situation is being worked on, but as of v0.13, Windows users need a workaround.

A simple way is to use other libraries like soundfile and convert the decoded NumPy NdArray object into PyTorch Tensor.

Another way is to install FFmpeg, and use torchaudio.io.StreamReader. You can write your own load function, following the tutorial like this.

https://pytorch.org/audio/0.13.1/tutorials/streamreader_basic_tutorial.html#sphx-glr-tutorials-streamreader-basic-tutorial-py