pythonaudiospeech-recognitionspeechheuristics

Split speech audio file on words in python


I feel like this is a fairly common problem but I haven't yet found a suitable answer. I have many audio files of human speech that I would like to break on words, which can be done heuristically by looking at pauses in the waveform, but can anyone point me to a function/library in python that does this automatically?


Solution

  • An easier way to do this is using pydub module. recent addition of silent utilities does all the heavy lifting such as setting up silence threahold , setting up silence length. etc and simplifies code significantly as opposed to other methods mentioned.

    Here is an demo implementation , inspiration from here

    Setup:

    I had a audio file with spoken english letters from A to Z in the file "a-z.wav". A sub-directory splitAudio was created in the current working directory. Upon executing the demo code, the files were split onto 26 separate files with each audio file storing each syllable.

    Observations: Some of the syllables were cut off, possibly needing modification of following parameters,
    min_silence_len=500
    silence_thresh=-16

    One may want to tune these to one's own requirement.

    Demo Code:

    from pydub import AudioSegment
    from pydub.silence import split_on_silence
    
    sound_file = AudioSegment.from_wav("a-z.wav")
    audio_chunks = split_on_silence(sound_file, 
        # must be silent for at least half a second
        min_silence_len=500,
    
        # consider it silent if quieter than -16 dBFS
        silence_thresh=-16
    )
    
    for i, chunk in enumerate(audio_chunks):
    
        out_file = ".//splitAudio//chunk{0}.wav".format(i)
        print "exporting", out_file
        chunk.export(out_file, format="wav")
    

    Output:

    Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32
    Type "copyright", "credits" or "license()" for more information.
    >>> ================================ RESTART ================================
    >>> 
    exporting .//splitAudio//chunk0.wav
    exporting .//splitAudio//chunk1.wav
    exporting .//splitAudio//chunk2.wav
    exporting .//splitAudio//chunk3.wav
    exporting .//splitAudio//chunk4.wav
    exporting .//splitAudio//chunk5.wav
    exporting .//splitAudio//chunk6.wav
    exporting .//splitAudio//chunk7.wav
    exporting .//splitAudio//chunk8.wav
    exporting .//splitAudio//chunk9.wav
    exporting .//splitAudio//chunk10.wav
    exporting .//splitAudio//chunk11.wav
    exporting .//splitAudio//chunk12.wav
    exporting .//splitAudio//chunk13.wav
    exporting .//splitAudio//chunk14.wav
    exporting .//splitAudio//chunk15.wav
    exporting .//splitAudio//chunk16.wav
    exporting .//splitAudio//chunk17.wav
    exporting .//splitAudio//chunk18.wav
    exporting .//splitAudio//chunk19.wav
    exporting .//splitAudio//chunk20.wav
    exporting .//splitAudio//chunk21.wav
    exporting .//splitAudio//chunk22.wav
    exporting .//splitAudio//chunk23.wav
    exporting .//splitAudio//chunk24.wav
    exporting .//splitAudio//chunk25.wav
    exporting .//splitAudio//chunk26.wav
    >>>