First four columns are the starting data, Onset column is desired output:
| Day | Hr | Min | Cnts/min | Onset |
|---|---|---|---|---|
| 1 | 6 | 0 | 9.7 | False |
| 1 | 6 | 10 | 0.7 | False |
| 1 | 6 | 20 | 0.5 | False |
| 1 | 6 | 30 | 32.9 | False |
| 1 | 6 | 40 | 2.9 | False |
| 1 | 6 | 50 | 0.1 | False |
| 1 | 7 | 10 | 12.3 | False |
| 1 | 7 | 20 | 0.1 | False |
| 1 | 7 | 30 | 34.3 | TRUE |
| 1 | 7 | 40 | 23.3 | False |
| 1 | 7 | 50 | 26.3 | False |
| 1 | 8 | 10 | 2.3 | False |
and so on... there are 2,790 rows for roughly 20 days worth of data.
The Goal: Find activity onset on each day.
The following appends a column called result based on a test case of the x2 column, but returning all NAs. What am I doing wrong?
library(zoo)
library(janitor)
library(tidyverse)
# load bin10 file
bin10 <- read_csv("./data/bin10.csv", skip=3)
bin10_wide <- bin10 %>%
group_by(Day, Hr, Min) %>%
summarize(`Cnts/min` = mean(`Cnts/min`, na.rm = TRUE)) %>%
pivot_wider(names_from = Day, values_from = `Cnts/min`) %>% clean_names()
custom_function <- function(x, max_val) {
# Check if the current cell is >20% of the max
if (x[1] > 0.2 * max_val) {
# Check if there are at least 3 cells in the next 6 that satisfy the criteria
count <- sum(x[2:7] > 0.2 * max_val)
return(count >= 3)
} else {
return(FALSE)
}
}
data <- bin10_wide %>%
mutate(result = rollapply(x2, width = 7, FUN = function(x) custom_function(x, max(x2, na.rm = TRUE)),
by.column = FALSE, align = "right", fill = NA))
look for the first time in the day where there is an amount of activity above a certain threshold, and that activity is sustained for at least 3 bins of the next 6 bins.
"next" is a contradiction to right-alignment. Please clarify. I suspect you are looking for right-alignment (a look behind) which means you are incorrectly using the term "next.
.2 * max(Cnts.min) for each day;rle() + cumsum();NA;partial argument of zoo::rollapply);width should be 6, not 7.based on base R and zoo::rollapplyr.
We use with() to write xyzzy less often. ave() allows us to do an operation on groups, on days (day) here. To the FUN-argument of ave() we give zoo::rollapplyr(), where the suffix r stands for right-alignment: for the first five observations we cannot compute something, since the window is of width 6. Normally, we fill those values with NA (cp. fill = NA).
> with(xyzzy, ave(Cnts.min, Day, FUN = \(x) {
+ zoo::rollapplyr(data = x > max(x, na.rm = TRUE) / 5,
+ width = 6, FUN = sum, fill = NA) > 2 }))
[1] NA NA NA NA NA 0 0 0 1 1 1 1
As expected, the first five values are NA. Then, within a window of width 6 (a look behind), we find 4 occurences where the sum of days meeting the condition is greater than 2. As you are only interested in the first occurence, we can use which.max(). To my understanding, if there are several maximum values, which.max() returns the index of the first occurence.
xyzzy$Onset =
with(xyzzy, ave(Cnts.min, Day, FUN = \(x) {
ires = zoo::rollapplyr(data = x > max(x, na.rm = TRUE) / 5,
width = 6, FUN = sum, fill = NA) > 2
# just the first occurence is relevant:
res = rep(0, length(ires)); res[which.max(ires)] = 1; res
}))
> xyzzy
Day Hr Min Cnts.min Onset
1 1 6 0 9.7 0
2 1 6 10 0.7 0
3 1 6 20 0.5 0
4 1 6 30 32.9 0
5 1 6 40 2.9 0
6 1 6 50 0.1 0
7 1 7 10 12.3 0
8 1 7 20 0.1 0
9 1 7 30 34.3 1
10 1 7 40 23.3 0
11 1 7 50 26.3 0
12 1 8 10 2.3 0
Since you mentioned that you like to return "Hr:Min" for some values, I am not using logical. E.g. if you like to replace the TRUE (1) values:
> with(xyzzy, ifelse(Onset == 1, paste0(Hr, ":", Min), Onset))
[1] "0" "0" "0" "0" "0" "0" "0" "0" "7:30" "0"
[11] "0" "0"
Notice, that ifelse() is considered to be slow, but easy-to-understand. In R, we can do rapidly fast operations based on vector indices and logicals (if necessary). However, speed is usally not of concern for medium-sized data (~2,700 observations).
xyzzy = read.table(text = "Day Hr Min Cnts/min
1 6 0 9.7
1 6 10 0.7
1 6 20 0.5
1 6 30 32.9
1 6 40 2.9
1 6 50 0.1
1 7 10 12.3
1 7 20 0.1
1 7 30 34.3
1 7 40 23.3
1 7 50 26.3
1 8 10 2.3", header = TRUE)