The dataframe df1
summarizes water temperature at different depths (T5m
,T15m
,T25m
,T35m
) for every hour (Datetime
). As an example of dataframe:
df1<- data.frame(Datetime=c("2016-08-12 12:00:00","2016-08-12 13:00:00","2016-08-12 14:00:00","2016-08-12 15:00:00","2016-08-13 12:00:00","2016-08-13 13:00:00","2016-08-13 14:00:00","2016-08-13 15:00:00"),
T5m= c(10,20,20,10,10,20,20,10),
T15m=c(10,20,10,20,10,20,10,20),
T25m=c(20,20,20,30,20,20,20,30),
T35m=c(20,20,10,10,20,20,10,10))
df1$Datetime<- as.POSIXct(df1$Datetime, format="%Y-%m-%d %H")
df1
Datetime T5m T15m T25m T35m
1 2016-08-12 12:00:00 10 10 20 20
2 2016-08-12 13:00:00 20 20 20 20
3 2016-08-12 14:00:00 20 10 20 10
4 2016-08-12 15:00:00 10 20 30 10
5 2016-08-13 12:00:00 10 10 20 20
6 2016-08-13 13:00:00 20 20 20 20
7 2016-08-13 14:00:00 20 10 20 10
8 2016-08-13 15:00:00 10 20 30 10
I would like to create a new dataframe df2
in which I have the average water temperature per day for either each depth interval and for the whole water column and the standard error estimation. I would expect something like this (I did the calculations by hand so there might be some mistakes):
> df2
Date meanT5m meanT15m meanT25m meanT35m meanTotal seT5m seT15m seT25m seT35m seTotal
1 2016-08-12 15 15 22.5 15 16.875 2.88 2.88 2.5 2.88 1.29
2 2016-08-13 15 15 22.5 15 16.875 2.88 2.88 2.5 2.88 1.29
I am especially interested in knowing how to do it with data.table
since I will work with huge data.frames and I think data.table
is quite efficient.
For calculating the standard error I know the function std.error()
from the package plotrix
.
Update based on @chinsoon's comment
First transform your data frame into a data table:
library(data.table)
setDT(df1)
Create a total
column:
df1[, total := rowSums(.SD), .SDcols = grep("T[0-9]+m", names(df1))][]
# Datetime T5m T15m T25m T35m total
# 1: 2016-08-12 12:00:00 10 10 20 20 60
# 2: 2016-08-12 13:00:00 20 20 20 20 80
# 3: 2016-08-12 14:00:00 20 10 20 10 60
# 4: 2016-08-12 15:00:00 10 20 30 10 70
# 5: 2016-08-13 12:00:00 10 10 20 20 60
# 6: 2016-08-13 13:00:00 20 20 20 20 80
# 7: 2016-08-13 14:00:00 20 10 20 10 60
# 8: 2016-08-13 15:00:00 10 20 30 10 70
Apply the functions per day
:
library(lubridate)
(df3 <- df1[, as.list(unlist(lapply(.SD, function (x)
c(mean = mean(x), sem = sd(x) / sqrt(length(x)))))),
day(Datetime)])
# day T5m.mean T5m.sem T15m.mean T15m.sem T25m.mean T25m.sem T35m.mean
# 1: 12 15 2.886751 15 2.886751 22.5 2.5 15
# 2: 13 15 2.886751 15 2.886751 22.5 2.5 15
# T35m.sem total.mean total.sem
# 1: 2.886751 67.5 4.787136
# 2: 2.886751 67.5 4.787136