I have a large data frame: percentage_activity
# A tibble: 4,437 x 3
# Groups: DATETIME [87]
DATETIME ID COUNT
<dttm> <chr> <int>
1 2020-06-07 00:00:00 Bagheera NA
2 2020-06-07 00:00:00 Bagheera2 0
3 2020-06-07 00:00:00 Baloo img 0
4 2020-06-07 00:00:00 Banna NA
5 2020-06-07 00:00:00 Blair 158
6 2020-06-07 00:00:00 Carol NA
in which I would like to calculate the mean of the top 5 COUNTs for a specific ID, and then, in a for loop, represent every COUNT value as a quantity with the mean value calculated for this ID as the 100% of this specific ID. To do that, I would really rather get a mean value not as a datafrme for all individuals but as a single number for the desired ID, and then use it as a variable inside the for loop.
I'm actually trying to reconstruct a loop that workd for the same data orgenized with seperated columns for each ID, but after melting the data to one ID colum It needs adjusments:
max_activity <- readline(prompt="enter a number: ")
for(i in 2:length(percentage_activity)) {
percentage_activity[[i]] <-
as.numeric(percentage_activity[[i]]*100/mean(sort(percentage_activity[[i]] ,T)
[1:max_activity]))
}
I also tried this, but I'm not sure how to proceed from here:
for (i in unique(percentage_activity$ID)){
individual <- percentage_activity$ID == i
mean(percentage_activity[individual,"COUNT"], na.rm=TRUE)
}
Maybe this may help:
library(dplyr)
df <- tibble(
DATETIME = as.Date(c("2020-06-07",
"2020-06-07",
"2020-06-07",
"2020-06-07",
"2020-06-07",
"2020-06-07",
"2020-06-07",
"2020-06-07",
"2020-06-07",
"2020-06-07",
"2020-06-07",
"2020-06-07")),
ID = c("Bagheera", "Bagheera2", "Baloo img", "Banna", "Blair", "Carol",
"Bagheera", "Bagheera2", "Baloo img", "Banna", "Blair", "Carol"),
COUNT = c(NA, 0,0,NA, 158, NA,10,20,30,40,50, 60)
)
mean_val <- df %>%
group_by(ID) %>%
arrange(desc(COUNT)) %>%
top_n(5) %>%
summarise(mean = mean(COUNT, na.rm = T))
df %>%
left_join(mean_val, by = "ID") %>%
mutate(percentage_activity = COUNT/mean)