I read a lot similar questions before asking a new one but here I am. I have a long data table that consist of plot, dbh, etc. An example of my data like this:
structure(list(plot = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), dbh = c(18L, 14L,
13L, 20L, 20L, 15L, 9L, 12L, 22L, 21L, 14L, 14L, 13L, 18L, 24L,
19L, 13L, 15L, 17L, 22L, 11L)), class = "data.frame", row.names = c(NA,
-21L))
What I want to do is find the average of 5 largest values by group (plot) and add this values as a new column to the same data table. I'm expecting to get the following result.
structure(list(plot = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), dbh = c(18L, 14L,
13L, 20L, 20L, 15L, 9L, 12L, 22L, 21L, 14L, 14L, 13L, 18L, 24L,
19L, 13L, 15L, 17L, 22L, 11L), dom = c(17.4, 17.4, 17.4, 17.4,
17.4, 17.4, 17.4, 17.4, 21.6, 21.6, 21.6, 21.6, 21.6, 21.6, 21.6,
21.6, 21.6, 21.6, 21.6, 21.6, 21.6)), class = "data.frame", row.names = c(NA,
-21L))
I will be appreciate for your help. Thanks.
PS: I tried many different codes within different packages (data.table, dplyr, etc..) however couldn't able to make it so I won't give any mwe that doesn't work.
You can use head
/tail
to get top 5 values :
df$dom <- with(df, ave(dbh, plot, FUN = function(x) mean(tail(sort(x), 5))))
#same as doing 1:5
#df$dom <- with(df, ave(dbh, plot, FUN = function(x)
mean(sort(x, decreasing = TRUE)[1:5])))
Or using dplyr
:
library(dplyr)
df %>% group_by(plot) %>% mutate(dom = mean(tail(sort(dbh), 5)))
and data.table
:
library(data.table)
setDT(df)[, dom := mean(tail(sort(dbh), 5)), plot]
df
# plot dbh dom
# 1: 1 18 17.4
# 2: 1 14 17.4
# 3: 1 13 17.4
# 4: 1 20 17.4
# 5: 1 20 17.4
# 6: 1 15 17.4
# 7: 1 9 17.4
# 8: 1 12 17.4
# 9: 2 22 21.6
#10: 2 21 21.6
#11: 2 14 21.6
#12: 2 14 21.6
#13: 2 13 21.6
#14: 2 18 21.6
#15: 2 24 21.6
#16: 2 19 21.6
#17: 2 13 21.6
#18: 2 15 21.6
#19: 2 17 21.6
#20: 2 22 21.6
#21: 2 11 21.6
# plot dbh dom
dplyr
also has slice_max
function (previously top_n
) to get top n
values in each group.
df %>%
group_by(plot) %>%
slice_max(dbh, n = 5) %>%
summarise(dom = mean(dbh)) %>%
left_join(df, by = 'plot')