[SOLVED] How to detect the precise sampling interval from samples stored in a database?

How to detect the precise sampling interval from samples stored in a database?

A hardware sensor is sampled precisely (precise period of sampling) using a real-time unit. However, the time value is not sent to the database together with the sampled value. Instead, time of insertion of the record to the database is stored for the sample in the database. The DATETIME type is used, and the GETDATE() function is used to get current time (Microsoft SQL Server).

How can I reconstruct the precise sampling times?

As the sampling interval is (should be) 60 seconds exactly, there was no need earlier for more precise solution. (This is an old solution, third party, with a lot of historical samples. This way it is not possible to fix the design.)

For processing of the samples, I need to reconstruct the correct time instances for the samples. There is no problem with shifting the time of the whole sequence (that is, it does not matter whether the start time is rather off, not absolute). On the other hand, the sampling interval should be detected as precisely as possible. I also cannot be sure, that the sampling interval was exactly 60 seconds (as mentioned above). I also cannot be sure, that the sampling interval was really constant (say, slight differences based on temperature of the device).

When processing the samples, I want to get:

start time
the sampling interval
the sequence o the sample values

When reconstructing the samples, I need to convert it back to tuples:

time of the sample
value of the sample

Because of that, for the sequence with n samples, the time of the last sample should be equal to start_time + sampling_interval * (n - 1), and it should be reasonably close to the original end time stored in the database.

Think in terms of the stored sample times slightly oscillate with respect to the real sample-times (the constant delay between the sampling and the insertion into the database is not a problem here).

I was thinking about calculating the mean value and the corrected standard deviation for the interval calculated from the previous and current sample times.

Discontinuity detection: If the calculated interval is greater than 3 sigma off the mean value, I would consider it a discontinuity of the sampled curve (say, the machine is switched off, or any outer event lead to missing samples. In the case, I want to start with processing a new sequence. (The sampling frequency could also be changed.)

Is there any well known approach to the problem. If yes, can you point me to the article(s)? Or can you give me the name or acronym of the algorithm?

Solution

+1 to looking at the difference sequence. We can model the difference sequence as the sum of a low frequency truth (the true rate of the samples, slowly varying over time) and high frequency noise (the random delay to get the sample into the database). You want a low-pass filter to remove the latter.