pythontimestampfeature-extraction

How to create windows based on time when i have irregular sample rate?


Μy dataset consists of timeseries which are measurements from sensors (accelerometer, gyroscope, magnetometer). I need to create windows in order to extract features and create feature vectors. The problem is that the sample rate is irregular. For instance, sensors may stop recording for 1 minute and then continue again. Example of my dataset:

**Timestamp                x         y         z**

2022-12-25 08:55:31  0.462288 -0.747311 -0.049593
2022-12-25 08:55:31  0.792116 -1.437709  0.702323
2022-12-25 08:55:31  0.880261 -0.185562  1.129537
2022-12-25 08:55:32 -0.084058  0.441366  0.955718
2022-12-25 08:55:32 -0.107756  0.319304  1.090497
2022-12-25 08:55:32 -0.091866  0.373503  1.034103
2022-12-25 08:56:59  0.341448  0.085186  1.297256
2022-12-25 08:56:59  0.426420  0.233355  1.137589
2022-12-25 08:57:00  1.150247 -0.665053  0.202337

Until now i have created 2 seconds windows based on timestamp. The problem is that my code does not recognize the gap between 32 and 59 second. What i need is to split the dataframe at that point and keep creating windows starting from 59 second. Here is my code:

def create_windows(df):
  grouped = df.groupby('Seconds')
  dfs = [grouped.get_group(x) for x in grouped.groups]
  ls = []
  for i in range(len(dfs)-1):
    a = pd.concat([dfs[i], dfs[i+1]], axis=0)
    ls.append(a)

My results are:

**Seconds                   x       y         z**       

2022-12-25 08:55:24  1.000126 -1.102270  0.227957
2022-12-25 08:55:24  0.872452 -0.747067 -0.476837
2022-12-25 08:55:24  0.734745 -0.864248 -0.090860
2022-12-25 08:55:24  1.083604 -1.301008  0.451095
2022-12-25 08:55:25  0.459849 -1.184315  0.344436
2022-12-25 08:55:25 -0.028884 -0.918935  0.478209
2022-12-25 08:55:25  0.355386 -0.998021 -0.362340
        
                    
**Seconds                 x         y          z ** 
                        
2022-12-25 08:55:25  0.938607 -0.928207  0.069052
2022-12-25 08:55:25  1.156865 -0.720959  0.349072
2022-12-25 08:55:25  0.931287 -1.360330  0.592462
2022-12-25 08:55:25  0.362462 -0.977769  0.517280
2022-12-25 08:55:26  1.638277 -1.305400  0.142283
2022-12-25 08:55:26  0.679326 -0.734867 -0.002257
2022-12-25 08:55:26  0.738405 -0.601064 -0.321806

What i try to fix:

**Seconds                   x       y         z**       

2022-12-25 08:55:32 -0.107756  0.319304  1.090497
2022-12-25 08:55:32 -0.091866  0.373503  1.034103
2022-12-25 08:56:59  0.341448  0.085186  1.297256
2022-12-25 08:56:59  0.426420  0.233355  1.137589

Solution

  • With the dataframe you provided:

    import pandas as pd
    
    df = pd.DataFrame(
        {
            "Timestamp": [
                "2022-12-25 08:55:31",
                "2022-12-25 08:55:31",
                "2022-12-25 08:55:31",
                "2022-12-25 08:55:32",
                "2022-12-25 08:55:32",
                "2022-12-25 08:55:32",
                "2022-12-25 08:56:59",
                "2022-12-25 08:56:59",
                "2022-12-25 08:57:00",
            ],
            "x": [
                0.462288,
                0.792116,
                0.880261,
                -0.084058,
                -0.107756,
                -0.091866,
                0.341448,
                0.42642,
                1.150247,
            ],
            "y": [
                -0.747311,
                -1.437709,
                -0.185562,
                0.441366,
                0.319304,
                0.373503,
                0.085186,
                0.233355,
                -0.665053,
            ],
            "z": [
                -0.049593,
                0.702323,
                1.129537,
                0.955718,
                1.090497,
                1.034103,
                1.297256,
                1.137589,
                0.202337,
            ],
        }
    )
    

    Here is one way to do it with Pandas Timedelta and unique:

    df["Timestamp"] = pd.to_datetime(df["Timestamp"], infer_datetime_format=True)
    
    dfs = [
        df.loc[
            (df["Timestamp"] >= v)
            & (df["Timestamp"] <= v + pd.Timedelta(value=1, unit="second")),
            :,
        ]
        for v in df["Timestamp"].unique()
    ]
    

    Then:

    for df_ in dfs:
        print(df_)
    # Output
    
                Timestamp         x         y         z
    0 2022-12-25 08:55:31  0.462288 -0.747311 -0.049593
    1 2022-12-25 08:55:31  0.792116 -1.437709  0.702323
    2 2022-12-25 08:55:31  0.880261 -0.185562  1.129537
    3 2022-12-25 08:55:32 -0.084058  0.441366  0.955718
    4 2022-12-25 08:55:32 -0.107756  0.319304  1.090497
    5 2022-12-25 08:55:32 -0.091866  0.373503  1.034103
                Timestamp         x         y         z
    3 2022-12-25 08:55:32 -0.084058  0.441366  0.955718
    4 2022-12-25 08:55:32 -0.107756  0.319304  1.090497
    5 2022-12-25 08:55:32 -0.091866  0.373503  1.034103
                Timestamp         x         y         z
    6 2022-12-25 08:56:59  0.341448  0.085186  1.297256
    7 2022-12-25 08:56:59  0.426420  0.233355  1.137589
    8 2022-12-25 08:57:00  1.150247 -0.665053  0.202337
                Timestamp         x         y         z
    8 2022-12-25 08:57:00  1.150247 -0.665053  0.202337