I have some time series data in Pandas where I need to extract specific local minimums from a column so I can use them as Features in a LSTM model. To visualize what I'm looking for I've attached a Picture, where the circled points are the values that I wish to locate.
The other red dots that you see at the bottom of the graph is my failed attempt of using "argrelextrema" with the following code:
#Trying to Locate Minimum Values
df['HKL Min'] = df.iloc[argrelextrema(df.hkla.values, np.less_equal,order=50)[0]]['hkla']
#Plotting a range of values from dataset:
sns.lineplot(x=df.index[0:3000], y= 'hkla', data=df[0:3000], label='Hookload');
sns.scatterplot(x=df.index[0:3000], y= 'HKL Min', data=df[0:3000], s= 50, color ='red', label='HKL Min');
As you may notice, my column data has a repetitive pattern, and the points I wish to locate are the minimas found between two "peaks-pairs".Is there some existing functions in Python that can help me locate these specific points? Any form of help would be highly appreciated. I am also open to other suggestions that can solve my issue here...
You could do something like this with your data:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.signal import argrelextrema
np.random.seed(1234)
rs = np.random.randn(500)
xs = [0]
for r in rs:
xs.append(xs[-1] * 0.999 + r)
df = pd.DataFrame(xs, columns=['point'])
which gives this data
point
0 0.000000
1 0.471435
2 -0.720012
3 0.713415
4 0.400050
.. ...
496 3.176240
497 3.007734
498 3.123841
499 1.045736
500 0.041935
[501 rows x 1 columns]
You can choose how often you want to mark a local ma or min by playing with a parameter:
n = 10
df['min'] = df.iloc[argrelextrema(df.point.values, np.less_equal,
order=n)[0]]['point']
df['max'] = df.iloc[argrelextrema(df.point.values, np.greater_equal,
order=n)[0]]['point']
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='r')
plt.plot(df.index, df['point'])
plt.show()
Which gives:
Another choice for n
might be (and it all depends on what you want):
n = 40
df['min'] = df.iloc[argrelextrema(df.point.values, np.less_equal,
order=n)[0]]['point']
df['max'] = df.iloc[argrelextrema(df.point.values, np.greater_equal,
order=n)[0]]['point']
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='g')
plt.plot(df.index, df['point'])
plt.show()
To get a marking for which points actually where max and min, you can make a new df:
new_df = pd.DataFrame(np.where(df.T == df.T.max(), 1, 0),index=df.columns).T
which gives the information about which row in df
is a maximum or a minimum. Otherwise, the original df
contains that information in the created min
and max
columns, those instance that aren't nan
EDIT: Finding peaks above threshold
If you are intrested of peaks above a certain value, then you should use find_peaks
in the following way:
from scipy.signal import find_peaks
peaks, _ = find_peaks(df['point'], height = 15)
plt.plot(df['point'])
plt.plot(peaks, df['point'][peaks], "x")
plt.show()
which will produce:
peaks,_
(array([304, 309, 314, 317, 324, 329, 333, 337, 343, 349, 352, 363, 366,
369, 372, 374, 377, 379, 381, 383, 385, 387, 391, 394, 397, 400,
403, 410, 413, 418, 424, 427, 430, 433, 436, 439, 442, 444, 448],
dtype=int64),
{'peak_heights': array([15.68868141, 15.97184882, 15.04790966, 15.6146908 , 16.49191501,
18.0852033 , 18.11467247, 19.48469432, 21.32391722, 19.90407526,
19.93683051, 24.40980129, 28.00319793, 26.1080406 , 24.44322213,
23.16993982, 22.27505873, 21.47500832, 22.3236231 , 24.02484906,
23.83727054, 24.32609486, 21.25365717, 21.10295203, 20.03162979,
20.64021444, 19.78510855, 21.62624829, 22.34904425, 21.60431638,
18.41968769, 18.24153961, 18.00747871, 18.02793964, 16.72552016,
17.58573207, 16.90982675, 16.9905686 , 16.30563852])})