First four columns are the starting data, Onset column is desired output:
Day | Hr | Min | Cnts/min | Onset |
---|---|---|---|---|
1 | 6 | 0 | 9.7 | False |
1 | 6 | 10 | 0.7 | False |
1 | 6 | 20 | 0.5 | False |
1 | 6 | 30 | 32.9 | False |
1 | 6 | 40 | 2.9 | False |
1 | 6 | 50 | 0.1 | False |
1 | 7 | 10 | 12.3 | False |
1 | 7 | 20 | 0.1 | False |
1 | 7 | 30 | 34.3 | TRUE |
1 | 7 | 40 | 23.3 | False |
1 | 7 | 50 | 26.3 | False |
1 | 8 | 10 | 2.3 | False |
and so on... there are 2,790 rows for roughly 20 days worth of data.
The Goal: Find activity onset on each day.
The following appends a column called result based on a test case of the x2 column, but returning all NAs. What am I doing wrong?
library(zoo)
library(janitor)
library(tidyverse)
# load bin10 file
bin10 <- read_csv("./data/bin10.csv", skip=3)
bin10_wide <- bin10 %>%
group_by(Day, Hr, Min) %>%
summarize(`Cnts/min` = mean(`Cnts/min`, na.rm = TRUE)) %>%
pivot_wider(names_from = Day, values_from = `Cnts/min`) %>% clean_names()
custom_function <- function(x, max_val) {
# Check if the current cell is >20% of the max
if (x[1] > 0.2 * max_val) {
# Check if there are at least 3 cells in the next 6 that satisfy the criteria
count <- sum(x[2:7] > 0.2 * max_val)
return(count >= 3)
} else {
return(FALSE)
}
}
data <- bin10_wide %>%
mutate(result = rollapply(x2, width = 7, FUN = function(x) custom_function(x, max(x2, na.rm = TRUE)),
by.column = FALSE, align = "right", fill = NA))
look for the first time in the day where there is an amount of activity above a certain threshold, and that activity is sustained for at least 3 bins of the next 6 bins.
"next" is a contradiction to right-alignment. Please clarify. I suspect you are looking for right-alignment (a look behind) which means you are incorrectly using the term "next.
.2 * max(Cnts.min)
for each day
;rle()
+ cumsum()
;NA
;partial
argument of zoo::rollapply
);width
should be 6, not 7.based on base
R and zoo::rollapplyr
.
We use with()
to write xyzzy
less often. ave()
allows us to do an operation on groups, on days (day
) here. To the FUN
-argument of ave()
we give zoo::rollapplyr()
, where the suffix r stands for right-alignment: for the first five observations we cannot compute something, since the window is of width 6
. Normally, we fill those values with NA
(cp. fill = NA
).
> with(xyzzy, ave(Cnts.min, Day, FUN = \(x) {
+ zoo::rollapplyr(data = x > max(x, na.rm = TRUE) / 5,
+ width = 6, FUN = sum, fill = NA) > 2 }))
[1] NA NA NA NA NA 0 0 0 1 1 1 1
As expected, the first five values are NA
. Then, within a window of width 6
(a look behind), we find 4 occurences where the sum of days meeting the condition is greater than 2. As you are only interested in the first occurence, we can use which.max()
. To my understanding, if there are several maximum values, which.max()
returns the index of the first occurence.
xyzzy$Onset =
with(xyzzy, ave(Cnts.min, Day, FUN = \(x) {
ires = zoo::rollapplyr(data = x > max(x, na.rm = TRUE) / 5,
width = 6, FUN = sum, fill = NA) > 2
# just the first occurence is relevant:
res = rep(0, length(ires)); res[which.max(ires)] = 1; res
}))
> xyzzy
Day Hr Min Cnts.min Onset
1 1 6 0 9.7 0
2 1 6 10 0.7 0
3 1 6 20 0.5 0
4 1 6 30 32.9 0
5 1 6 40 2.9 0
6 1 6 50 0.1 0
7 1 7 10 12.3 0
8 1 7 20 0.1 0
9 1 7 30 34.3 1
10 1 7 40 23.3 0
11 1 7 50 26.3 0
12 1 8 10 2.3 0
Since you mentioned that you like to return "Hr:Min" for some values, I am not using logical. E.g. if you like to replace the TRUE
(1
) values:
> with(xyzzy, ifelse(Onset == 1, paste0(Hr, ":", Min), Onset))
[1] "0" "0" "0" "0" "0" "0" "0" "0" "7:30" "0"
[11] "0" "0"
Notice, that ifelse()
is considered to be slow, but easy-to-understand. In R
, we can do rapidly fast operations based on vector indices and logicals (if necessary). However, speed is usally not of concern for medium-sized data (~2,700 observations).
xyzzy = read.table(text = "Day Hr Min Cnts/min
1 6 0 9.7
1 6 10 0.7
1 6 20 0.5
1 6 30 32.9
1 6 40 2.9
1 6 50 0.1
1 7 10 12.3
1 7 20 0.1
1 7 30 34.3
1 7 40 23.3
1 7 50 26.3
1 8 10 2.3", header = TRUE)