I want to analyze the data of a crack-meter (measures the aperture of a crack in the ground through the time). I have the temperature data from a nearby sensor. I have stored them as time-indexed pandas.
When plotting the data it is easy to see that both are correlated. Therefore the temperature is influencing the aperture of the crack.
Plot Crack apeture vs Temperature
I have plotted some comparative of the data using an scatter plot (Just used the data of 2023 because the correlation is more clear on that months).
Scatter comparative between data
The aim is to remove the fluctuation in the aperture that it is caused by temperature fluctuations. With that we will be able to analyze the evolution of the aperture that is "independent" of the temperature fluctuations.
I share the January 2023 data. If more than one month of data is required, I can share more months.
Thank you in advance.
import pandas as pd
import numpy as np
df_crack = pd.DataFrame({'date': ['2023-01-01 00:00:00', '2023-01-02 00:00:00',
'2023-01-03 00:00:00', '2023-01-04 00:00:00',
'2023-01-05 00:00:00', '2023-01-06 00:00:00',
'2023-01-07 00:00:00', '2023-01-08 00:00:00',
'2023-01-09 00:00:00', '2023-01-10 00:00:00',
'2023-01-11 00:00:00', '2023-01-12 00:00:00',
'2023-01-13 00:00:00', '2023-01-14 00:00:00',
'2023-01-15 00:00:00', '2023-01-16 00:00:00',
'2023-01-17 00:00:00', '2023-01-18 00:00:00',
'2023-01-19 00:00:00', '2023-01-20 00:00:00',
'2023-01-21 00:00:00', '2023-01-22 00:00:00',
'2023-01-23 00:00:00', '2023-01-24 00:00:00',
'2023-01-25 00:00:00', '2023-01-26 00:00:00',
'2023-01-27 00:00:00', '2023-01-28 00:00:00',
'2023-01-29 00:00:00', '2023-01-30 00:00:00',
],
'aperture': [0.452762281,0.372262281,0.513928948,0.447762281,
0.377095615,0.355095615,0.271428948,0.291762281,
0.476762281,0.335928948,0.280428948,0.283762281,
0.322928948,0.287262281,0.316928948,0.209262281,
0.407928948,0.254262281,0.232095615,0.264262281,
0.076095615,-0.025237719,-0.042237719,-0.094904385,
0.017428948,-0.036071052,-0.094071052,-0.071404385,
0.008095615,-0.141571052]})
df_crack['date'] = pd.to_datetime(df_crack['date'])
df_crack = df_crack.set_index('date')
df_temp = pd.DataFrame({'date': ['2023-01-01 00:00:00', '2023-01-02 00:00:00',
'2023-01-03 00:00:00', '2023-01-04 00:00:00',
'2023-01-05 00:00:00', '2023-01-06 00:00:00',
'2023-01-07 00:00:00', '2023-01-08 00:00:00',
'2023-01-09 00:00:00', '2023-01-10 00:00:00',
'2023-01-11 00:00:00', '2023-01-12 00:00:00',
'2023-01-13 00:00:00', '2023-01-14 00:00:00',
'2023-01-15 00:00:00', '2023-01-16 00:00:00',
'2023-01-17 00:00:00', '2023-01-18 00:00:00',
'2023-01-19 00:00:00', '2023-01-20 00:00:00',
'2023-01-21 00:00:00', '2023-01-22 00:00:00',
'2023-01-23 00:00:00', '2023-01-24 00:00:00',
'2023-01-25 00:00:00', '2023-01-26 00:00:00',
'2023-01-27 00:00:00', '2023-01-28 00:00:00',
'2023-01-29 00:00:00', '2023-01-30 00:00:00',
],
'temperature': [9.6,8,8.4,6.2,6.2,6,3.9,8.5,8.3,5.3,5.6,5.3,
6.2,6.3,6.9,4.8,6.7,3.6,3,4.6,2.3,1.3,1,0.3,
1.6,0.4,1.5,1.4,2.2,1.2]})
df_temp['date'] = pd.to_datetime(df_temp['date'])
df_temp = df_temp.set_index('date')
EDIT
Now, I don't think there is an obvious relationship between aperture size and temperature. If we take the moving average at 15 or 30 days and plot, it appears that for a linear temperature, the size of the opening varies a lot (look at the average temperature of 8°C)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Read data and fill nan (the method doesn't matter, there are 5 missing values)
df = pd.read_csv('crackmeter.csv', index_col='date', parse_dates=['date'])
df['aperture'] = df['aperture'].fillna(df.groupby(df['temperature'].round())['aperture'].transform('mean'))
g = sns.scatterplot(df.rolling('30D').mean().to_period('M'), x='aperture', y='temperature', hue='date')
g.axes.axhline(8)
plt.show()
It's clearly not a programming question. The temperature dataframe is probably not relevant here because the aperture already depends on the temperature. I'm not specialized on time series analysis but you should look about seasonal decomposition.
If you consider your aperture is the sum (additive model) of 3 components: Trend, Seasonal (temperature) and Residual, you can use seasonal_decompose
from statsmodels
:
from statsmodels.tsa.seasonal import seasonal_decompose
crack = seasonal_decompose(df_crack['aperture'])
crack.plot()
plt.show()
out = pd.concat([df_crack, crack.trend, crack.seasonal, crack.resid], axis=1)
Output:
>>> out
aperture trend seasonal resid
date
2023-01-01 0.452762 NaN -0.035330 NaN
2023-01-02 0.372262 NaN 0.004456 NaN
2023-01-03 0.513929 NaN 0.027567 NaN
2023-01-04 0.447762 0.398619 0.021857 0.027286
2023-01-05 0.377096 0.375619 0.001988 -0.000512
2023-01-06 0.355096 0.390548 0.018172 -0.053625
2023-01-07 0.271429 0.365119 -0.038711 -0.054980
2023-01-08 0.291762 0.341215 -0.035330 -0.014123
2023-01-09 0.476762 0.327881 0.004456 0.144425
2023-01-10 0.335929 0.323286 0.027567 -0.014924
2023-01-11 0.280429 0.325548 0.021857 -0.066976
2023-01-12 0.283762 0.329143 0.001988 -0.047369
2023-01-13 0.322929 0.290929 0.018172 0.013828
2023-01-14 0.287262 0.301215 -0.038711 0.024758
2023-01-15 0.316929 0.297477 -0.035330 0.054782
2023-01-16 0.209262 0.290096 0.004456 -0.085289
2023-01-17 0.407929 0.281715 0.027567 0.098647
2023-01-18 0.254262 0.251548 0.021857 -0.019143
2023-01-19 0.232096 0.202667 0.001988 0.027441
2023-01-20 0.264262 0.166738 0.018172 0.079351
2023-01-21 0.076096 0.094905 -0.038711 0.019901
2023-01-22 -0.025238 0.061072 -0.035330 -0.050980
2023-01-23 -0.042238 0.022762 0.004456 -0.069456
2023-01-24 -0.094904 -0.028428 0.027567 -0.094043
2023-01-25 0.017429 -0.049500 0.021857 0.045072
2023-01-26 -0.036071 -0.044738 0.001988 0.006679
2023-01-27 -0.094071 -0.058928 0.018172 -0.053315
2023-01-28 -0.071404 NaN -0.038711 NaN
2023-01-29 0.008096 NaN -0.035330 NaN
Maybe you can consider the resid
component as the result of the aperture without the temperature part?
So, you should ask your question on Cross Validated forum.