pythonaudioscipysignal-processing

How to obtain sound envelope using python


Hello I new with python and also with sound signal analysis. I am trying to get the envelope of a birth song (zebra finch). It has a very rapid signal fluctuations and I tried with different approach. For instance I tried to plot the signal and get the envelope with the following code base on other examples that I found (I added comments on the code to understand it):

#Import the libraries
from pylab import *
import numpy
import scipy.signal.signaltools as sigtool
import scipy, pylab
from scipy.io import wavfile
import wave, struct
import scipy.signal as signal

#Open the txt file and read the wave file (also save it as txt file)

f_out = open('mike_1_44100_.txt', 'w')
w     = scipy.io.wavfile.read("mike_1_44100_.wav") #here your sound file

a=w[1]
f_out.write('#time #z' + '\n')

#I print to check
print 'vector w'
print w[0],w[1]
print w

i=w[1].size
p=numpy.arange(i)*0.0000226 #to properly define the time signal with    the sample rate

print 'vector p:'
print p 

x=numpy.dstack([p,a])

print 'vector x:'
print x[0]

#saving file
numpy.savetxt('mike_1_44100_.txt',x[0])

f_out.close()
print 'i:'
print i

# num is the number of samples in the resampled signal.
num= np.ceil(float(i*0.0000226)/0.0015)
print num

y_resample, x_resample = scipy.signal.resample(numpy.abs(a),num, p,axis=0, window=('gaussian',150))

#y_resample, x_resample = scipy.signal.resample(numpy.abs(a), num, p,axis=-1, window=0)

#Aplaying a filter 

W1=float(5000)/(float(44100)/2) #the frequency  for the cut over the sample frequency

(b, a1) = signal.butter(4, W1, btype='lowpass')
aaa=a
slp =1* signal.filtfilt(b, a1, aaa)

#Taking the abs value of the signal the resample and finaly aplying the hilbert transform

y_resample2 =numpy.sqrt(numpy.abs(np.imag(sigtool.hilbert(slp, axis=-1)))**2+numpy.abs(np.real(sigtool.hilbert(slp, axis=-1)))**2)

print 'x sampled'
#print x_resample
print 'y sampled'
#print  y_resample

xx=x_resample #[0]
yy=y_resample #[1]

#ploting with some style

plot(p,a,label='Time Signal') #to plot amplitud vs time
#plot(p,numpy.abs(a),label='Time signal')
plot(xx,yy,label='Resampled time signal Fourier technique Gauss window 1.5 ms ', linewidth=3)
#plot(ww,label='Window', linewidth=3)
#plot(p,y_resample2,label='Hilbert transformed sime signal', linewidth=3)

grid(True)
pylab.xlabel("time [s]")
pylab.ylabel("Amplitde")

legend()
show()

Here I tried two things, the first is use the resample function from scipy to get the envelope, but I have some problem with the signal amplitude that I don't understand yet (I uploaded the image obtained with the fourier technique but system does not allow me):

The second is to use the hilbert transform to get the envelope (now I uploaded the image with the hilbert transform again the system does not allow me) It is possible to run my code and obtain the two images. But ill put the with this link http://ceciliajarne.web.unq.edu.ar/?page_id=92&preview=true

Now the "envelope" fails again. I tried filtering the signal as i saw in some examples, but my signal is attenuated and i am not able to obtain the envelope. Could anybody help my with my code or with a better idea to get the envelope? It is possible to use as example any bird song (I can give you mine), but i need to see what happen with complex sounds not simple signals, because it is very different (with simple sounds both techniques are ok).

I also tried to adap the code that I found in: http://nipy.org/nitime/examples/mtm_baseband_power.html

But I am not able to get the proper parameters for my signal, and i don't understand the modulation part. I already ask to the code developers, and til waiting the answer.


Solution

  • Since with a bird song the "modulation frequency" probably will be much lower than the "carrier frequency" even with a rapidly varying amplitude, an approximation to the envelope could be obtained by taking the absolute value of your signal and then applying a moving average filter with say 20 ms length.

    Still, wouldn't you be interested in frequency variations as well, to adequately characterize the song? In that case, taking the Fourier transform over a moving window would give you far more information, namely the approximate frequency content as a function of time. Which is what we humans hear and helps us discriminate between bird species.

    If you don't want the attenuation, you should neither apply a Butterworth filter nor take the moving average, but apply peak detection instead.

    Moving average: Each output sample is the average of the absolute value of e.g. 50 preceding input samples. The output will be attenuated.

    Peak detection: Each output sample is the maximum of the absolute value of e.g. 50 preceding input samples. The output will not be attenuated. You can lowpass filter afterward to get rid of the remaining staircase "riple".

    You wonder why e.g. a Butterworth filter will attenuate your signal. It hardly does if your cutoff frequency is high enough, but it just SEEMS to be strongly attenuated. Your input signal is not the sum of the carrier (whistle) and the modulation (envelope) but the product. Filtering will limit the frequency content. What remains are frequency components (terms) rather than factors. You see an attenuated modulation (envelope) because that frequency component is indeed present in your signal MUCH weaker than the original envelope, since it was not added to your carrier but multiplied with it. Since the carrier sinusoid that your envelope is multiplied with, is not always at its maximum value, the envelope will be "attenuated" by the modulation process, not by your filtering analysis.

    In short: If you directly want the (multiplicative) envelope rather than the (additive) frequency component due to modulation (multiplication) with the envelope, take the peak detection approach.

    Peak detection algorithm in "Pythonish" pseudocode, just to get the idea.

    # Untested, but apart from typos this should work fine
    # No attention paid to speed, just to clarify the algorithm
    # Input signal and output signal are Python lists
    # Listcomprehensions will be a bit faster
    # Numpy will be a lot faster
    
    def getEnvelope (inputSignal):
        
        # Taking the absolute value
        
        absoluteSignal = []
        for sample in inputSignal:
            absoluteSignal.append (abs (sample))
        
        # Peak detection
        
        intervalLength = 50 # Experiment with this number, it depends on your sample frequency and highest "whistle" frequency
        outputSignal = []
        
        for baseIndex in range (intervalLength, len (absoluteSignal)):
            maximum = 0
            for lookbackIndex in range (intervalLength):
                maximum = max (absoluteSignal [baseIndex - lookbackIndex], maximum)
            outputSignal.append (maximum)
        
        return outputSignal