So, I have an audio file which is very long in duration. I have manual annotations (start and end duration in seconds) of the important parts which I need from the whole audio in a text file. I have converted this text file into a nested list where in each list has [start , end]
The whole list looks like [[start1,end1],[start2,end2]......]
what I need to do is go through my annotation list shown above, get one timestamp(start and end time sublist) and then crop this part from the whole original audio and then the next timestamp and crop that part out from the whole audio and so on. I understand that I need to make sure the reference for the timings must be in accordance with the first unedited original audio.
note that, the timestamps are float values and its quite important to keep them as is. The next step would be to extract audio characteristics such as mfcc from the cropped audio file.
fs1, y1 = scipy.io.wavfile.read(file_path)
l1 = numpy.array(annotation_list)
newWavFileAsList = []
for elem in l1:
startRead = elem[0]
endRead = elem[1]
newWavFileAsList.extend(y1[startRead:endRead])
newWavFile = numpy.array(newWavFileAsList)
scipy.io.wavfile.write(sample, fs1, newWavFile)
I have tried it as above, however it shows an error that the indexes startRead and endRead must be integers. I understand referencing y1 using those indexes is completely dumb, but how can I relate the duration which I have in seconds to the indexes of the read audio file? How do you suggest I approach this?
Try out Pydub! :)
from pydub import AudioSegment
def trim_audio(intervals, input_file_path, output_file_path):
# load the audio file
audio = AudioSegment.from_file(input_file_path)
# iterate over the list of time intervals
for i, (start_time, end_time) in enumerate(intervals):
# extract the segment of the audio
segment = audio[start_time*1000:end_time*1000]
# construct the output file path
output_file_path_i = f"{output_file_path}_{i}.wav"
# export the segment to a file
segment.export(output_file_path_i, format='wav')
# test it out
print("Trimming audio...")
trim_audio([[0, 1], [1, 2]], "test_input.wav", "test_output")
print("...done! <3")
This code works for me. Lmk if you encounter any problems.
Edit: Just to let you know, I tried this out for floats, and it works just fine. I took a look at it and it seemed like it should behave oddly with floats, but it apparently works just fine. I tried long weird ones like 2.2352344, seems ok.
Another edit: I just remembered you might need ffmpeg to be able to use Pydub. To install ffmpeg, go download it, extract it, then add the path of it to your Windows path variable.