I have some time series data that can be 1Hz, 10Hz, or 100Hz. the file I load in happens to be 1Hz:
In [6]: data = pd.read_csv("ftp.csv")
In [7]: data.Time
Out[7]:
0 NaN
1 11:30:08 AM
2 11:30:09 AM
3 11:30:10 AM
4 11:30:11 AM
5 11:30:12 AM
6 11:30:13 AM
I convert it to datetime with:
In [8]: time = pd.to_datetime(data.Time)
In [9]: time
Out[9]:
0 NaT
1 2015-03-03 11:30:08
2 2015-03-03 11:30:09
3 2015-03-03 11:30:10
4 2015-03-03 11:30:11
5 2015-03-03 11:30:12
From here how can I verify what the sampling frequency is? Do I have to do this manually or can I use a built in pandas method?
One method after converting to datetime64, if frequency sampling rate is the same then we could call diff()
to calculate the difference between all rows which should be the same and compare this with a np.timedelta64
type, so for your sample data this would be:
In [277]:
all(df.datetime.diff()[1:] == np.timedelta64(1, 's')) == True
Out[277]:
True
In [278]:
df.datetime.diff()
Out[278]:
0
1 NaT
2 00:00:01
3 00:00:01
4 00:00:01
5 00:00:01
6 00:00:01
Name: datetime, dtype: timedelta64[ns]
In [279]:
df.datetime.diff()[1:] == np.timedelta64(1, 's')
Out[279]:
0
2 True
3 True
4 True
5 True
6 True
Name: datetime, dtype: bool
to check if the freq was 10hz or 100hz just change the units to np.timedelta64
so for 10hz: np.timedelta64(100, 'ms')
and for 100hz: np.timedelta64(10, 'ms')
The np.timedelta64
units can be found here: http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html#datetime-and-timedelta-arithmetic