I'm currently running a script where I take an entire audio file and save it using the audiofile
library (which, in-turn, uses the soundfile
library) in Python.
I'm trying to mimic the behavior of audiofile.read()
where I give it an offset and duration (in seconds) and only return the respective numpy array of that particular sound interval. The only difference here is that instead of taking in a .wav
file like the library requires, I'll already have the entire audio file as a numpy array and need to extract the correct start and end intervals from it.
I've tried copying the logic of calculating the start and end and just slicing the numpy array from sound_file[start:end]
but that doesn't seem to work. I'm not too familiar with how signal processing works with audio files so I'm at a little bit of a loss here and any help would be appreciated!
Here's my code:
I expect it to take in a numpy array, and return the same numpy array sliced to include only the start + the duration specified. All the files I've loaded were originally 96KHz that were resampled to 16KHz and saved as numpy arrays.
from audiofile.core.utils import duration_in_seconds
import audmath
def read_from_np(
file,
duration,
offset,
sampling_rate = 16000
):
if duration is not None:
duration = duration_in_seconds(duration, sampling_rate)
if np.isnan(duration):
duration = None
if offset is not None and offset != 0:
offset = duration_in_seconds(offset, sampling_rate)
if np.isnan(offset):
offset = None
# Support for negative offset/duration values
# by counting them from end of signal
#
if offset is not None and offset < 0 or duration is not None and duration < 0:
# Import duration here to avoid circular imports
from audiofile.core.info import duration as get_duration
signal_duration = get_duration(file)
# offset | duration
# None | < 0
if offset is None and duration is not None and duration < 0:
offset = max([0, signal_duration + duration])
duration = None
# None | >= 0
if offset is None and duration is not None and duration >= 0:
if np.isinf(duration):
duration = None
# >= 0 | < 0
elif offset is not None and offset >= 0 and duration is not None and duration < 0:
if np.isinf(offset) and np.isinf(duration):
offset = 0
duration = None
elif np.isinf(offset):
duration = 0
else:
if np.isinf(duration):
offset = min([offset, signal_duration])
duration = np.sign(duration) * signal_duration
orig_offset = offset
offset = max([0, offset + duration])
duration = min([-duration, orig_offset])
# >= 0 | >= 0
elif offset is not None and offset >= 0 and duration is not None and duration >= 0:
if np.isinf(offset):
duration = 0
elif np.isinf(duration):
duration = None
# < 0 | None
elif offset is not None and offset < 0 and duration is None:
offset = max([0, signal_duration + offset])
# >= 0 | None
elif offset is not None and offset >= 0 and duration is None:
if np.isinf(offset):
duration = 0
# < 0 | > 0
elif offset is not None and offset < 0 and duration is not None and duration > 0:
if np.isinf(offset) and np.isinf(duration):
offset = 0
duration = None
elif np.isinf(offset):
duration = 0
elif np.isinf(duration):
duration = None
else:
offset = signal_duration + offset
if offset < 0:
duration = max([0, duration + offset])
else:
duration = min([duration, signal_duration - offset])
offset = max([0, offset])
# < 0 | < 0
elif offset is not None and offset < 0 and duration is not None and duration < 0:
if np.isinf(offset):
duration = 0
elif np.isinf(duration):
duration = -signal_duration
else:
orig_offset = offset
offset = max([0, signal_duration + offset + duration])
duration = min([-duration, signal_duration + orig_offset])
duration = max([0, duration])
# Convert to samples
#
# Handle duration first
# and returned immediately
# if duration == 0
if duration is not None and duration != 0:
duration = audmath.samples(duration, sampling_rate)
if duration == 0:
from audiofile.core.info import channels as get_channels
channels = get_channels(file)
if channels > 1 or always_2d:
signal = np.zeros((channels, 0))
else:
signal = np.zeros((0,))
return signal, sampling_rate
if offset is not None and offset != 0:
offset = audmath.samples(offset, sampling_rate)
else:
offset = 0
start = offset
# duration == 0 is handled further above with immediate return
if duration is not None:
stop = duration + start
return np.expand_dims(file[0, start:stop], 0)
Your code boils down to
return np.expand_dims(file[0, start:stop], 0)
which is correct.
So if you're unhappy with the result,
it is due to computing the wrong (start, stop)
pair,
that is, the wrong (offset, duration)
pair.
The sample rate is apparently fixed at exactly 16_000
samples per second.
The number of channels can be 1
or 2
, which seems worrisome.
There's a crazy amount of optional behavior
associated with the offset
and duration
parameters.
Get rid of it.
Focus on writing a simple helper which accepts
an offset that is always a non-negative integer,
and a duration that is always a positive finite integer.
No NaN
s.
Use assert
or raise
so that None
or negative
will blow up with fatal error.
Next, focus on audio segments that always have the same number of channels.
At that point, it won't be hard to get it right.