I'm working with a large dataset of different variables collected during the dives of elephant seals. I would like to analyze my data on a fine-scale (20 second intervals). I want to bin my data into 20 second intervals, basically I just want to get the mean for every 20 seconds, so I can run more analysis on these intervals of data. However, I need to group my data by dive # so that I'm not binning information from separate dives.
There are three methods I've tried so far:
period.apply()
but I cannot group with this function.split()
to subset my data by dive #, but can't seem to find a way to then calculate the mean of
different columns over 20 second intervals within these subsets.timeaverage()
but continue to get an error (see code below).Below is what the data looks like, and the code I've tried. I would like the means of Depth, MSA, rate_s, and HR for each 20 second window - grouped by diveNum and ~ideally~ also D_phase.
> head(seal_dives)
datetime seal_ID Depth MSA D_phase diveNum rate_s HR
1 2018-04-06 14:47:51 Congaree 4.5 0.20154042 D 1 NA 115.3846
2 2018-04-06 14:47:51 Congaree 4.5 0.20154042 D 1 NA 117.6471
3 2018-04-06 14:47:52 Congaree 4.5 0.11496760 D 1 NA 115.3846
4 2018-04-06 14:47:52 Congaree 4.5 0.11496760 D 1 NA 122.4490
5 2018-04-06 14:47:53 Congaree 4.5 0.05935992 D 1 NA 113.2075
6 2018-04-06 14:47:53 Congaree 4.5 0.05935992 D 1 NA 113.2075
#openair package using timeaverage, results in error message
> library(openair)
> seal_20<-timeAverage(
seal_dives,
avg.time = "20 sec",
data.thresh = 0,
statistic = "mean",
type = c("diveNum","D_phase"),
percentile = NA,
start.date = NA,
end.date = NA,
vector.ws = FALSE,
fill = FALSE
)
Can't find the variable(s) date
Error in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
#converting to time series and using period.apply(), but can't find a way to group them by dive #, or use split() then convert to time series.
#create a time series data class from our data frame
> seal_dives$datetime<-as.POSIXct(seal_dives$datetime,tz="GMT")
> seal_xts <- xts(seal_dives, order.by=seal_dives[,1])
> seal_20<-period.apply(seal_xts$Depth, endpoints(seal_xts$datetime, "seconds", 20), mean)
#split data by dive # but don't know how to do averages over 20 seconds
> seal_split<-split(seal_dives, seal_dives$diveNum)
Maybe there is a magical way to do this that I haven't found on the internet yet, or maybe I'm just doing something wrong in one of my methods.
You can use floor_date
function from lubridate
to bin data every 20 seconds. Group them along with diveNum
and D_phase
to get average of other columns using across
.
library(dplyr)
library(lubridate)
result <- df %>%
group_by(diveNum, D_phase, datetime = floor_date(datetime, '20 sec')) %>%
summarise(across(c(Depth, MSA, rate_s, HR), mean, na.rm = TRUE), .groups = 'drop')
result