I have data collected from sensors that looks like:
sec nanosec value
1001 1 0.2
1001 2 0.2
1001 3 0.2
1002 1 0.1
1002 2 0.2
1002 3 0.1
1003 1 0.2
1003 2 0.2
1003 3 0.1
1004 1 0.2
1004 2 0.2
1004 3 0.2
1004 4 0.1
I want to calculate average,std deviation
and some other stats like maximum, minimum for a column every 2 seconds.
so average for (1001, 1002)= 0.167, average of (1003,1004)=0.17
From the tutorials http://earthpy.org/pandas-basics.html, I think I should convert it to time series and the use rolling _means from pandas, but I am new to time series data so I am not sure if that is the correct way. Also how do I specify frequency here for conversion as observations for the first second have less observations. So for actual data I have less than 100 readings for 1001 second and then 100 observations for 1002 second onwards.
I could also do a simple groupby on seconds but it would group readings per second and not every 2 seconds, then how could i combine observations for 2 consecutive groups from groupby and then do analysis.
I think you can first convert column sec
to_timedelta
, set_index
and resample
by 2 seconds
(2S
):
df['sec'] = pd.to_timedelta(df.sec, unit='s')
df.set_index('sec', inplace=True)
print (df)
nanosec value
sec
00:16:41 1 0.2
00:16:41 2 0.2
00:16:41 3 0.2
00:16:42 1 0.1
00:16:42 2 0.2
00:16:42 3 0.1
00:16:43 1 0.2
00:16:43 2 0.2
00:16:43 3 0.1
00:16:44 1 0.2
00:16:44 2 0.2
00:16:44 3 0.2
00:16:44 4 0.1
print (df.value.resample('2S').mean())
sec
00:16:41 0.166667
00:16:43 0.171429
00:16:45 NaN
Freq: 2S, Name: value, dtype: float64
print (df.value.resample('2S').std())
sec
00:16:41 0.051640
00:16:43 0.048795
00:16:45 NaN
Freq: 2S, Name: value, dtype: float64
print (df.value.resample('2S').max())
sec
00:16:41 0.2
00:16:43 0.2
00:16:45 NaN
Freq: 2S, Name: value, dtype: float64
Maybe you need change base
in resample
:
print (df.value.resample('2S', base=1).mean())
sec
00:16:42 0.166667
00:16:44 0.171429
00:16:46 NaN
Freq: 2S, Name: value, dtype: float64
print (df.value.resample('2S', base=1).std())
sec
00:16:42 0.051640
00:16:44 0.048795
00:16:46 NaN
Freq: 2S, Name: value, dtype: float64
print (df.value.resample('2S', base=1).max())
sec
00:16:42 0.2
00:16:44 0.2
00:16:46 NaN
Freq: 2S, Name: value, dtype: float64
print (df.value.resample('2S', base=2).mean())
sec
00:16:43 0.166667
00:16:45 0.171429
00:16:47 NaN
Freq: 2S, Name: value, dtype: float64
print (df.value.resample('2S', base=2).std())
sec
00:16:43 0.051640
00:16:45 0.048795
00:16:47 NaN
Freq: 2S, Name: value, dtype: float64
print (df.value.resample('2S', base=2).max())
sec
00:16:43 0.2
00:16:45 0.2
00:16:47 NaN
Freq: 2S, Name: value, dtype: float64