Μy dataset consists of timeseries which are measurements from sensors (accelerometer, gyroscope, magnetometer). I need to create windows in order to extract features and create feature vectors. The problem is that the sample rate is irregular. For instance, sensors may stop recording for 1 minute and then continue again. Example of my dataset:
**Timestamp x y z**
2022-12-25 08:55:31 0.462288 -0.747311 -0.049593
2022-12-25 08:55:31 0.792116 -1.437709 0.702323
2022-12-25 08:55:31 0.880261 -0.185562 1.129537
2022-12-25 08:55:32 -0.084058 0.441366 0.955718
2022-12-25 08:55:32 -0.107756 0.319304 1.090497
2022-12-25 08:55:32 -0.091866 0.373503 1.034103
2022-12-25 08:56:59 0.341448 0.085186 1.297256
2022-12-25 08:56:59 0.426420 0.233355 1.137589
2022-12-25 08:57:00 1.150247 -0.665053 0.202337
Until now i have created 2 seconds
windows based on timestamp. The problem is that my code does not recognize the gap between 32
and 59
second.
What i need is to split the dataframe at that point and keep creating windows starting from 59 second
.
Here is my code:
def create_windows(df):
grouped = df.groupby('Seconds')
dfs = [grouped.get_group(x) for x in grouped.groups]
ls = []
for i in range(len(dfs)-1):
a = pd.concat([dfs[i], dfs[i+1]], axis=0)
ls.append(a)
My results are:
**Seconds x y z**
2022-12-25 08:55:24 1.000126 -1.102270 0.227957
2022-12-25 08:55:24 0.872452 -0.747067 -0.476837
2022-12-25 08:55:24 0.734745 -0.864248 -0.090860
2022-12-25 08:55:24 1.083604 -1.301008 0.451095
2022-12-25 08:55:25 0.459849 -1.184315 0.344436
2022-12-25 08:55:25 -0.028884 -0.918935 0.478209
2022-12-25 08:55:25 0.355386 -0.998021 -0.362340
**Seconds x y z **
2022-12-25 08:55:25 0.938607 -0.928207 0.069052
2022-12-25 08:55:25 1.156865 -0.720959 0.349072
2022-12-25 08:55:25 0.931287 -1.360330 0.592462
2022-12-25 08:55:25 0.362462 -0.977769 0.517280
2022-12-25 08:55:26 1.638277 -1.305400 0.142283
2022-12-25 08:55:26 0.679326 -0.734867 -0.002257
2022-12-25 08:55:26 0.738405 -0.601064 -0.321806
What i try to fix:
**Seconds x y z**
2022-12-25 08:55:32 -0.107756 0.319304 1.090497
2022-12-25 08:55:32 -0.091866 0.373503 1.034103
2022-12-25 08:56:59 0.341448 0.085186 1.297256
2022-12-25 08:56:59 0.426420 0.233355 1.137589
With the dataframe you provided:
import pandas as pd
df = pd.DataFrame(
{
"Timestamp": [
"2022-12-25 08:55:31",
"2022-12-25 08:55:31",
"2022-12-25 08:55:31",
"2022-12-25 08:55:32",
"2022-12-25 08:55:32",
"2022-12-25 08:55:32",
"2022-12-25 08:56:59",
"2022-12-25 08:56:59",
"2022-12-25 08:57:00",
],
"x": [
0.462288,
0.792116,
0.880261,
-0.084058,
-0.107756,
-0.091866,
0.341448,
0.42642,
1.150247,
],
"y": [
-0.747311,
-1.437709,
-0.185562,
0.441366,
0.319304,
0.373503,
0.085186,
0.233355,
-0.665053,
],
"z": [
-0.049593,
0.702323,
1.129537,
0.955718,
1.090497,
1.034103,
1.297256,
1.137589,
0.202337,
],
}
)
Here is one way to do it with Pandas Timedelta and unique:
df["Timestamp"] = pd.to_datetime(df["Timestamp"], infer_datetime_format=True)
dfs = [
df.loc[
(df["Timestamp"] >= v)
& (df["Timestamp"] <= v + pd.Timedelta(value=1, unit="second")),
:,
]
for v in df["Timestamp"].unique()
]
Then:
for df_ in dfs:
print(df_)
# Output
Timestamp x y z
0 2022-12-25 08:55:31 0.462288 -0.747311 -0.049593
1 2022-12-25 08:55:31 0.792116 -1.437709 0.702323
2 2022-12-25 08:55:31 0.880261 -0.185562 1.129537
3 2022-12-25 08:55:32 -0.084058 0.441366 0.955718
4 2022-12-25 08:55:32 -0.107756 0.319304 1.090497
5 2022-12-25 08:55:32 -0.091866 0.373503 1.034103
Timestamp x y z
3 2022-12-25 08:55:32 -0.084058 0.441366 0.955718
4 2022-12-25 08:55:32 -0.107756 0.319304 1.090497
5 2022-12-25 08:55:32 -0.091866 0.373503 1.034103
Timestamp x y z
6 2022-12-25 08:56:59 0.341448 0.085186 1.297256
7 2022-12-25 08:56:59 0.426420 0.233355 1.137589
8 2022-12-25 08:57:00 1.150247 -0.665053 0.202337
Timestamp x y z
8 2022-12-25 08:57:00 1.150247 -0.665053 0.202337