pythonaudiowaveformsoundcloudamplitude

Getting max amplitude for an audio file per second


I know there are some similar questions here, but most of them are concerning generating waveform images, which is not what I want.

My goal is to generate a waveform visualization for an audio file, similar to SoundCloud, but not an image. I'd like to have the max amplitude data for each second (or half second) of an audio clip in an array. I could then use this data to create a CSS-based visualization.

Ideally I'd like to get an array that has all the amplitude values for each second as a percentage of the maximum amplitude of the entire audio file. Here's an example:

[
    0.0,  # Relative max amplitude of first second of audio clip (0%)
    0.04,  # Relative max amplitude of second second of audio clip (4%)
    0.15,  # Relative max amplitude of third second of audio clip (15%)
    # Some more
    1.0,  # The highest amplitude of the whole audio clip will be 1.0 (100%)
]

I assume I'll have to use at least numpy and Python's wave module, but I'm not sure how to get the data I want. I'd like to use Python but I'm not completely against using some kind of command-line tool.


Solution

  • If you allow gstreamer, here is a little script that could do the trick. It accept any audio file that gstreamer can handle.

    Snippet:

    import os, sys, pygst
    pygst.require('0.10')
    import gst, gobject
    gobject.threads_init()
    
    def get_peaks(filename):
        global do_run
    
        pipeline_txt = (
            'filesrc location="%s" ! decodebin ! audioconvert ! '
            'audio/x-raw-int,channels=1,rate=44100,endianness=1234,'
            'width=32,depth=32,signed=(bool)True !'
            'level name=level interval=1000000000 !'
            'fakesink' % filename)
        pipeline = gst.parse_launch(pipeline_txt)
    
        level = pipeline.get_by_name('level')
        bus = pipeline.get_bus()
        bus.add_signal_watch()
    
        peaks = []
        do_run = True
    
        def show_peak(bus, message):
            global do_run
            if message.type == gst.MESSAGE_EOS:
                pipeline.set_state(gst.STATE_NULL)
                do_run = False
                return
            # filter only on level messages
            if message.src is not level or \
               not message.structure.has_key('peak'):
                return
            peaks.append(message.structure['peak'][0])
    
        # connect the callback
        bus.connect('message', show_peak)
    
        # run the pipeline until we got eos
        pipeline.set_state(gst.STATE_PLAYING)
        ctx = gobject.gobject.main_context_default()
        while ctx and do_run:
            ctx.iteration()
    
        return peaks
    
    def normalize(peaks):
        _min = min(peaks)
        _max = max(peaks)
        d = _max - _min
        return [(x - _min) / d for x in peaks]
    
    if __name__ == '__main__':
        filename = os.path.realpath(sys.argv[1])
        peaks = get_peaks(filename)
    
        print 'Sample is %d seconds' % len(peaks)
        print 'Minimum is', min(peaks)
        print 'Maximum is', max(peaks)
    
        peaks = normalize(peaks)
        print peaks
    

    And one output example:

    $ python gstreamerpeak.py 01\ Tron\ Legacy\ Track\ 1.mp3 
    Sample is 182 seconds
    Minimum is -349.999999922
    Maximum is -2.10678956719
    [0.0, 0.0, 0.9274581631597019, 0.9528318436488018, 0.9492396611762614,
    0.9523404330322813, 0.9471685835966183, 0.9537281219301242, 0.9473486577135167,
    0.9479292126411365, 0.9538221105563514, 0.9483845795252251, 0.9536790832823281,
    0.9477264933378022, 0.9480077366961968, ...