In my R data.table mdt below, I have calculated a difference in dates using subtraction (simple mathematical expression) as days_elapsed1 and days_elapsed2. The class of these two objects is "difftime" expressed in days. Then, I take the mean of days_elapsed1 and days_elapsed2 across groups. The mean is also a difftime result expressed in days, but I cannot figure out how to round the difftime result. As simple as this seems, could anyone please provide an answer in base R or data.table? I'm relatively new to R.
Please note I'm avoiding the use of dplyr as I am trying to keep everything in data.table language.
Below is a reproducible example. I know there are a few more things here than needed, but it was far simpler for me to edit an existing example -- that earlier took me a long time to produce -- than reproduce another. My code uses brute force because I am new to R.
# Create a sample data set
library(data.table)
tests <- c("test1", "test2")
prob_bucket <- c(0.5, 0.3)
person_id <- sample(101:110, size=30, replace=TRUE)
grp <- sample(0:1, size = 30, replace = TRUE)
month <- sample(1:12, size=30, replace=TRUE)
day <- sample(1:28, size=30, replace=TRUE)
year <- sample(2018:2022, size=30, replace=TRUE)
test <- sample(tests, 30, replace=TRUE, prob = prob_bucket)
mydt <- data.table(person_id, grp, month, day, year, test)
mydt[, test_date:= as.Date(paste(month, day, year, sep = "-"),
format="%m-%d-%Y")]
mydt[test_date < as.Date("02-01-2019", format="%m-%d-%Y"), grp:=0]
mydt[test_date >= as.Date("02-01-2019", format="%m-%d-%Y"), grp:=1]
mydt[, grp := as.factor(grp)]
ss_a <- mydt[test=="test1"]
ss_b <- mydt[test=="test2"]
a <- round(runif(nrow(ss_a), 5.5, 8),1)
b <- sample(100:250, size=nrow(ss_b), replace=TRUE)
mydt[test=="test1", ind_test1:=1][test=="test1", test_value:=a]
mydt[test=="test2", ind_test2:=1][test=="test2", test_value:=b]
setorder(mydt, cols = "person_id", "grp")
# Identify critical values test1
mydt[ind_test1==1, critical_date_test1:= max(test_date), by=.(person_id, grp)]
mydt[, critical_date_test1:=mean(critical_date_test1, na.rm=TRUE), by=.(person_id, grp)]
mydt[test_date==critical_date_test1, critical_value_test1:= test_value]
mydt[, critical_value_test1:=mean(critical_value_test1, na.rm=TRUE), by=.(person_id, grp)]
# Identify critical values test2
mydt[ind_test2==1, critical_date_test2:= max(test_date), by=.(person_id, grp)]
mydt[, critical_date_test2:=mean(critical_date_test2, na.rm=TRUE), by=.(person_id, grp)]
mydt[test_date==critical_date_test2, critical_value_test2:= test_value]
mydt[, critical_value_test2:=mean(critical_value_test2, na.rm=TRUE), by=.(person_id, grp)]
# Days elapsed between fixed date and critical test dates
fixed_date <-as.Date("2-1-2019", format="%m-%d-%Y")
mydt[grp==0, days_elapsed1 := fixed_date-critical_date_test1]
mydt[grp==1, days_elapsed1 := critical_date_test1-fixed_date]
mydt[grp==0, days_elapsed2 := fixed_date-critical_date_test2]
mydt[grp==1, days_elapsed2 := critical_date_test2-fixed_date]
# Collapse data
dt_summ1_collapsed <- mydt[, .SD[1], by=.(person_id, grp)]
days_elapsed <- c("days_elapsed1", "days_elapsed2")
# Take the mean across people in the collapsed data set, by grp
dt_summ_collapsed2 <- dt_summ1_collapsed[, lapply(.SD, mean, na.rm=TRUE),
by=grp, .SDcols=days_elapsed]
# Two different attempts to round the result
dt_summ_collapsed2[, .SD:=round(.SD, 0), .SDcols=days_elapsed]
dt_summ_collapsed2[, .SD:=round_date(.SD, "day"), .SDcols=days_elapsed]
The function round_date()
expects the input to be dates/ difference of dates. However, the columns of days_elapsed1
and days_elapsed2
are numeric values.
It should work if you replace the last line with the following:
dt_summ_collapsed2[, (n_days):=round(.SD), .SDcols=n_days]
Update on 18 April 2023, based on updated question featuring example
Method 1: Use a different function while calculating mean
RoundMean <- function(x){
Results <- round(mean(x, na.rm = T), 0)
return(Results)
}
dt_summ_collapsed2 <- dt_summ1_collapsed[, lapply(.SD, RoundMean),
by=grp, .SDcols=days_elapsed]
Method 2: Update columns on rounding one by one
dt_summ_collapsed2[, days_elapsed1 := round(days_elapsed1, 0)]
dt_summ_collapsed2[, days_elapsed2 := round(days_elapsed2, 0)]