I am trying to fine tune wav2vec2 model with my dataset. For this reason I loaded audios. Now want to downsample them to 16kHz. But librosa.reshape
function is giving an error which I couldn't resolve. The error message is:
resample() takes 1 positional argument but 3 were given
Firstly, I tried to load it with librosa
with sampling rate 16kHz. But as I have less experience in this field, and I'm facing problem in the later part of my project because of this. I found a code which supposed to resample the audio signal. I tried to use it, but faced the above mentioned problem.
This part works fine:
database={}
audios = []
psr = []
for path in df['audio']:
speech_array,sr = torchaudio.load(path)
audios.append(speech_array[0].numpy())
psr.append(sr)
database['audio'] = audios
database['psr'] = psr
And I get an error for every index:
import librosa
import numpy as np
# Assuming 'database' is your DataFrame containing 'audio' and 'psr' columns
# List to store new sampling rates
new_sr = []
# Resample each audio signal and store the new sampling rate
for i in range(len(database['psr'])):
try:
audio_signal = np.asarray(database['audio'][i]) # Convert audio to numpy array
original_sr = database['psr'][i] # Original sampling rate
# Check if the audio signal is mono (single-channel)
if audio_signal.ndim == 1:
# Resample mono audio signal
resampled_audio = librosa.resample(audio_signal, original_sr, 16000)
else:
# Resample each channel separately for multi-channel audio
resampled_channels = []
for channel in audio_signal:
resampled_channel = librosa.resample(channel, original_sr, 16000)
resampled_channels.append(resampled_channel)
resampled_audio = np.array(resampled_channels)
# Store resampled audio back in DataFrame
database['audio'][i] = resampled_audio
# Store new sampling rate (16000 Hz)
new_sr.append(16000)
except Exception as e:
print(f"Error processing audio at index {i}: {e}")
# Add new sampling rates to the DataFrame
database['newsr'] = new_sr
Here is the definition of reshape
[src] :
@cache(level=20)
def resample(
y: np.ndarray,
*, # forces you to pass all the following arguments only as named ones
orig_sr: float,
target_sr: float,
res_type: str = "soxr_hq",
fix: bool = True,
scale: bool = False,
axis: int = -1,
**kwargs: Any,
) -> np.ndarray:
Docs also provide an example of doing so:
y, sr = librosa.load(librosa.ex('trumpet'), sr=22050)
y_8k = librosa.resample(y, orig_sr=sr, target_sr=8000)
So in your case resample
calls should be:
# Resample mono audio signal
resampled_audio = librosa.resample(audio_signal,
orig_sr=original_sr,
target_sr=16000)
...
# Resample each channel separately for multi-channel audio
resampled_channel = librosa.resample(channel,
orig_sr=original_sr,
target_sr=16000)