In R, I am trying to calculate the geometric mean (exp(mean(log(x, na.rm=T))) across all columns in a data frame by participant ID. The data frame is in long format. Below is a comparable code that I have so far... it isn't working. I have also tried data.table, but still unsuccessful. Any help appreciated
mtcars_sub <- mtcars[,1:2]
mtcars_sub_gm <- mtcars_sub %>%
group_by(cyl) %>%
summarise_all(function (x) exp(mean(log(x, na.rm=TRUE))))
gm_vars <- names(mtcars_sub )[1] #this is very simplistic, but in my actual program there are +80 columns
mtcars_sub_gm <- mtcars_sub [,lapply(.SD, function(x) {exp(mean(log(x, na.rm=T)))}), by =
cyl, .SDcols = gm_vars]
I think the issue was related to the placement of the na.rm = TRUE
, which should be a parameter of mean()
but was placed within the log()
parentheses.
library(dplyr)
mtcars[,1:5] %>%
group_by(cyl) %>%
summarize(across(everything(), ~exp(mean(log(.x), na.rm=TRUE))))
# A tibble: 3 × 5
cyl mpg disp hp drat
<dbl> <dbl> <dbl> <dbl> <dbl>
1 4 26.3 102. 80.1 4.06
2 6 19.7 180. 121. 3.56
3 8 14.9 347. 204. 3.21