python scipy signal-processing lowpass-filter

How to make a low pass filter in scipy.signal?

I have several questions on making a lowpass filter in python/scipy.signal. Would appreciate any answers.

I'm trying to understand how scipy.signal.lfilter works. Does it take the filter coefficients and multiply them by the data values, so that for data[500], it'll do

for b in range(0,len(coeff)):
    filtered = filtered + data[500-b]*coeff[b]

What's the difference between doing that and what it's doing?

I also don't understand how the number of taps affects when it starts filtering. For different number of taps I see it starting at different values on the data. I assumed it'll start only after it has the necessary coefficients. In my example I have 683 coefficients from scipy.signal.firwin, but the filter starts at 300-400, as you can see in the image Firwin lowpass filter (filter is in blue; sinewave is in red; x goes from 0-1000)

from scipy import signal

a = signal.firwin(683, cutoff = 1/30, window = "hamming")
t = signal.lfilter(a,1,sig)

With an fs=1, cutoff = fs/30, I get a lowpass filter with firwin that is delayed a lot as shown by the image above. What can I do to improve the delay?
How would changing the sampling rate affect the filter?
I've found two methods online to approximate the number of taps:

import math

(2/3) * math.log10(1/(10*ripple*attenuation)) * fs/transition_width
((-10*math.log10(ripple*attenuation) - 13)/(14.6)) * fs/transition_width

Which is a better approximation?

Any clarification would be greatly appreciated.

Solution

I'm trying to understand how scipy.signal.lfilter works.

scipy.signal.lfilter(b, a, x) implements "infinite impulse response" (IIR), aka "recursive", filtering, in which b and a represent the IIR filter and x is the input signal.

The b arg is an array of M+1 numerator (feedforward) filter coefficients, and a is an array of N+1 denominator (feedback) filter coefficients. Conventionally, a[0] = 1 (otherwise the filter can be normalized to make it so), so I'll assume a[0] = 1. The nth output sample y[n] is computed as

  y[n] = b[0] * x[n] + b[1] * x[n-1] + ... + b[M] * x[n-M]
                     - a[1] * y[n-1] - ... - a[N] * y[n-N].

The peculiar thing with IIR filtering is that this formula for y[n] depends on the previous N output values y[n-1], ..., y[n-N]; the formula is a recursive. So in order to start this process, conventionally the filter is "zero initialized" by assuming y[n] and x[n] are zero for n < 0. This is what scipy.signal.lfilter does by default.

You can also use scipy.signal.lfilter to apply a "finite impulse response" (FIR) by setting a = [1], as you have done in question 2. Then there are no recursive feedback terms in the filtering formula, so filtering becomes simply the convolution of b and x.

I also don't understand how the number of taps affects when it starts filtering. For different number of taps I see it starting at different values on the data. I assumed it'll start only after it has the necessary coefficients. In my example I have 683 coefficients from scipy.signal.firwin, but the filter starts at 300-400, as you can see in the image Firwin lowpass filter (filter is in blue; sinewave is in red; x goes from 0-1000)

scipy.signal.lfilter starts filtering immediately. As I mentioned above, it does this (by default) assuming y[n] and x[n] are zero for n < 0, meaning it computes the first output sample to be y[0] is computed as

  y[0] = b[0] * x[0].

However, depending on your filter, it could be that b[0] is near zero, which could explain why nothing appears to happen at the beginning.

A good way to inspect the behavior of a filter is to compute its "impulse response", that is look the output resulting from passing the unit impulse [1, 0, 0, 0, ...] as input:

plot(scipy.signal.lfilter(b, a, [1] + [0] * 800))

Here is what I get for b = firwin(683, cutoff = 1/30, window = "hamming"), a = [1]:

We can see several things from this plot: the impulse response is very small at first, then rises and oscillates, with its peak at sample index 341, then symmetrically decays again to zero. The filter has a delay of 341 = 683 // 2, that is, half the number of taps that you specified to firwin when designing the filter.

What can I do to improve the delay?

Try reducing the number of taps 683 to something smaller. Or if you don't require filtering to be causal, try scipy.ndimage.convolve1d, which shifts the computation so that the filter is centered:

scipy.ndimage.convolve1d(sig, firwin_filter, mode='constant')

How would changing the sampling rate affect the filter?

For most filter designs, provided the cutoffs are less than 1/4th the sample rate, the exact sample rate has little effect. Or in other words, it's usually not a problem unless the cutoffs are moderately close to the Nyquist frequency.

I've found two methods online to approximate the number of taps.

I'm not familiar with these formulas. Beware that the number of taps needed to achieve target characteristics depends on the particular design method, so pay attention to what context those formulas assume.

Within scipy.signal, I'd suggest using kaiserord together with firwin or firwin2 for a target amount of ripple and transition width. Here is their example, in which the 65 is the stopband ripple in dB and width is the transition width in Hz:

Use kaiserord to determine the length of the filter and the parameter for the Kaiser window.
>>> numtaps, beta = kaiserord(65, width/(0.5*fs))
>>> numtaps
167
>>> beta
6.20426
Use firwin to create the FIR filter.
>>> taps = firwin(numtaps, cutoff, window=('kaiser', beta),
                  scale=False, fs=fs)

Edit: With other designs, kaiserord might get in the right ballpark, but don't rely on that necessarily achieving a target amount of ripple or transition width. So a possible general strategy might be an iterative process like this:

Use kaiserord to get an initial estimate of the number of taps.
Design the filter with that many taps.
Use scipy.signal.freqz to get the filter's frequency response.
Evaluate how close the response magnitude is to the desired response. Namely, over the passband, compute the max absolute difference from one, and over the stopband, compute the max absolute difference from zero.
If the response is not close enough, increase the number of taps and return to step 2. Otherwise, see if you can get away with decreasing the number of taps.