I have a signal like the orange one in the following plot that can only have integer values:
As you can see, the orange signal in a bit noisy and "waffles" between levels sometimes when its about to change to a new steady state. I'd like to "smooth" this effect and achieve the blue signal. The blue signal is the orange one filtered such that the transitions don't occur until 3 samples in a row have made the jump to the next step. This is pretty easy if I loop through each sample manually and use a couple state variables to track how many times in a row I've jumped to a new step, but its also slow. I'd like to find a way to vectorize this in numpy. Any ideas?
Here's an example of the non-vectorized way that seems to do what I want:
up_count = 0
dn_count = 0
out = x.copy()
for i in range(len(out)-1):
if out[i+1] > out[i]:
up_count += 1
dn_count = 0
if up_count == 3:
up_count = 0
out[i+1] = out[i+1]
else:
out[i+1] = out[i]
elif out[i+1] < out[i]:
up_count = 0
dn_count += 1
if dn_count == 3:
dn_count = 0
out[i+1] = out[i+1]
else:
out[i+1] = out[i]
else:
dn_count = 0
up_count = 0
EDIT:
Thanks to @Bogdan Shevchenko for this solution. I already have numpy and scipy available, so rather than get pandas involved here's my numpy/scipy version of his answer:
def ffill(arr, mask):
idx = np.where(~mask, np.arange(mask.shape[0])[:, None], 0)
np.maximum.accumulate(idx, axis=0, out=idx)
return arr[idx, np.arange(idx.shape[1])]
x_max = scipy.ndimage.maximum_filter1d(x, 3, axis=0, origin=1, mode="nearest")
x_min = scipy.ndimage.minimum_filter1d(x, 3, axis=0, origin=1, mode="nearest")
x_smooth = ffill(x, x_max!=x_min)
There could be a lot of different strategies, based on DataFrame.rolling
processing. In example:
t = pd.Series([
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,
1, 2, 1, 2, 1, 1, 2, 2, 3, 1, 2,
2, 1, 1, 2, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
min_at_row = 3
t_smoothed = t.copy()
t_smoothed[
t.rolling(min_at_row, min_periods=1).max() !=
t.rolling(min_at_row, min_periods=1).min()
] = None
t_smoothed = t_smoothed.ffill()
Result looks like what you want to obtain:
If you want to stay with numpy only, without pandas, use np.lib.stride_tricks.sliding_window_view(t, min_at_row).max(axis=1)
instead of t.rolling(min_at_row).max()
(and analogously with min
), but be aware that it will return array without first (min_at_row - 1
) values
You can also use t_smoothed.shift(-min_at_row + 1)
to remove delay of smoothened signal.