pythonpandascorrelationsensorstemperature

How to subtract the effect of temperature on sensor data?


I want to analyze the data of a crack-meter (measures the aperture of a crack in the ground through the time). I have the temperature data from a nearby sensor. I have stored them as time-indexed pandas.

When plotting the data it is easy to see that both are correlated. Therefore the temperature is influencing the aperture of the crack.

Plot Crack apeture vs Temperature

I have plotted some comparative of the data using an scatter plot (Just used the data of 2023 because the correlation is more clear on that months).

Scatter comparative between data

The aim is to remove the fluctuation in the aperture that it is caused by temperature fluctuations. With that we will be able to analyze the evolution of the aperture that is "independent" of the temperature fluctuations.

I share the January 2023 data. If more than one month of data is required, I can share more months.

Thank you in advance.

import pandas as pd
import numpy as np

df_crack = pd.DataFrame({'date': ['2023-01-01 00:00:00', '2023-01-02 00:00:00', 
                          '2023-01-03 00:00:00', '2023-01-04 00:00:00',
                          '2023-01-05 00:00:00', '2023-01-06 00:00:00',
                          '2023-01-07 00:00:00', '2023-01-08 00:00:00',
                          '2023-01-09 00:00:00', '2023-01-10 00:00:00',
                          '2023-01-11 00:00:00', '2023-01-12 00:00:00',
                          '2023-01-13 00:00:00', '2023-01-14 00:00:00',
                          '2023-01-15 00:00:00', '2023-01-16 00:00:00',
                          '2023-01-17 00:00:00', '2023-01-18 00:00:00',
                          '2023-01-19 00:00:00', '2023-01-20 00:00:00',
                          '2023-01-21 00:00:00', '2023-01-22 00:00:00',
                          '2023-01-23 00:00:00', '2023-01-24 00:00:00',
                          '2023-01-25 00:00:00', '2023-01-26 00:00:00',
                          '2023-01-27 00:00:00', '2023-01-28 00:00:00',
                          '2023-01-29 00:00:00', '2023-01-30 00:00:00',
                          ], 
               'aperture': [0.452762281,0.372262281,0.513928948,0.447762281,
                            0.377095615,0.355095615,0.271428948,0.291762281,
                            0.476762281,0.335928948,0.280428948,0.283762281,
                            0.322928948,0.287262281,0.316928948,0.209262281,
                            0.407928948,0.254262281,0.232095615,0.264262281,
                            0.076095615,-0.025237719,-0.042237719,-0.094904385,
                            0.017428948,-0.036071052,-0.094071052,-0.071404385,
                            0.008095615,-0.141571052]})

df_crack['date'] = pd.to_datetime(df_crack['date'])
df_crack = df_crack.set_index('date')

df_temp = pd.DataFrame({'date': ['2023-01-01 00:00:00', '2023-01-02 00:00:00', 
                          '2023-01-03 00:00:00', '2023-01-04 00:00:00',
                          '2023-01-05 00:00:00', '2023-01-06 00:00:00',
                          '2023-01-07 00:00:00', '2023-01-08 00:00:00',
                          '2023-01-09 00:00:00', '2023-01-10 00:00:00',
                          '2023-01-11 00:00:00', '2023-01-12 00:00:00',
                          '2023-01-13 00:00:00', '2023-01-14 00:00:00',
                          '2023-01-15 00:00:00', '2023-01-16 00:00:00',
                          '2023-01-17 00:00:00', '2023-01-18 00:00:00',
                          '2023-01-19 00:00:00', '2023-01-20 00:00:00',
                          '2023-01-21 00:00:00', '2023-01-22 00:00:00',
                          '2023-01-23 00:00:00', '2023-01-24 00:00:00',
                          '2023-01-25 00:00:00', '2023-01-26 00:00:00',
                          '2023-01-27 00:00:00', '2023-01-28 00:00:00',
                          '2023-01-29 00:00:00', '2023-01-30 00:00:00',
                          ], 
               'temperature': [9.6,8,8.4,6.2,6.2,6,3.9,8.5,8.3,5.3,5.6,5.3,
                               6.2,6.3,6.9,4.8,6.7,3.6,3,4.6,2.3,1.3,1,0.3,
                               1.6,0.4,1.5,1.4,2.2,1.2]})

df_temp['date'] = pd.to_datetime(df_temp['date'])
df_temp = df_temp.set_index('date')

January 2023 plot data


Solution

  • EDIT

    Now, I don't think there is an obvious relationship between aperture size and temperature. If we take the moving average at 15 or 30 days and plot, it appears that for a linear temperature, the size of the opening varies a lot (look at the average temperature of 8°C)

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Read data and fill nan (the method doesn't matter, there are 5 missing values)
    df = pd.read_csv('crackmeter.csv', index_col='date', parse_dates=['date'])
    df['aperture'] = df['aperture'].fillna(df.groupby(df['temperature'].round())['aperture'].transform('mean'))
    
    g = sns.scatterplot(df.rolling('30D').mean().to_period('M'), x='aperture', y='temperature', hue='date')
    g.axes.axhline(8)
    plt.show()
    

    enter image description here


    It's clearly not a programming question. The temperature dataframe is probably not relevant here because the aperture already depends on the temperature. I'm not specialized on time series analysis but you should look about seasonal decomposition.

    If you consider your aperture is the sum (additive model) of 3 components: Trend, Seasonal (temperature) and Residual, you can use seasonal_decompose from statsmodels:

    from statsmodels.tsa.seasonal import seasonal_decompose
    
    crack = seasonal_decompose(df_crack['aperture'])
    crack.plot()
    plt.show()
    out = pd.concat([df_crack, crack.trend, crack.seasonal, crack.resid], axis=1)
    

    Output:

    enter image description here

    >>> out
                aperture     trend  seasonal     resid
    date                                              
    2023-01-01  0.452762       NaN -0.035330       NaN
    2023-01-02  0.372262       NaN  0.004456       NaN
    2023-01-03  0.513929       NaN  0.027567       NaN
    2023-01-04  0.447762  0.398619  0.021857  0.027286
    2023-01-05  0.377096  0.375619  0.001988 -0.000512
    2023-01-06  0.355096  0.390548  0.018172 -0.053625
    2023-01-07  0.271429  0.365119 -0.038711 -0.054980
    2023-01-08  0.291762  0.341215 -0.035330 -0.014123
    2023-01-09  0.476762  0.327881  0.004456  0.144425
    2023-01-10  0.335929  0.323286  0.027567 -0.014924
    2023-01-11  0.280429  0.325548  0.021857 -0.066976
    2023-01-12  0.283762  0.329143  0.001988 -0.047369
    2023-01-13  0.322929  0.290929  0.018172  0.013828
    2023-01-14  0.287262  0.301215 -0.038711  0.024758
    2023-01-15  0.316929  0.297477 -0.035330  0.054782
    2023-01-16  0.209262  0.290096  0.004456 -0.085289
    2023-01-17  0.407929  0.281715  0.027567  0.098647
    2023-01-18  0.254262  0.251548  0.021857 -0.019143
    2023-01-19  0.232096  0.202667  0.001988  0.027441
    2023-01-20  0.264262  0.166738  0.018172  0.079351
    2023-01-21  0.076096  0.094905 -0.038711  0.019901
    2023-01-22 -0.025238  0.061072 -0.035330 -0.050980
    2023-01-23 -0.042238  0.022762  0.004456 -0.069456
    2023-01-24 -0.094904 -0.028428  0.027567 -0.094043
    2023-01-25  0.017429 -0.049500  0.021857  0.045072
    2023-01-26 -0.036071 -0.044738  0.001988  0.006679
    2023-01-27 -0.094071 -0.058928  0.018172 -0.053315
    2023-01-28 -0.071404       NaN -0.038711       NaN
    2023-01-29  0.008096       NaN -0.035330       NaN
    

    Maybe you can consider the resid component as the result of the aperture without the temperature part?

    So, you should ask your question on Cross Validated forum.