So I have a function whos idea it is to operate on a vector of numbers. E.g. a vector of temperatures. I want to compute heatwaves (in a very simplified way...). Lets say a heatwave starts with three consecutive days of above 30 °C.
So I would need a back-reference to store how long the current heatwave already is. I wrote a function that uses a for-loop internally. In pseudo-code it kind of looks like this:
is_heatwave = function(vals){
length_heatwave = 0
# returns a vector with the length of the input vals
day_in_heatwave = vector(length=length(vals))
days_in_current_heatwave =c()
for(i in 1:length(vals)){
val = vals[[i]]
if(val > 30){
length_heatwave = length_heatwave + 1
days_in_current_heatwave = c(days_in_current_heatwave, i)
}else{
length_heatwave = 0
}
... some more code
}
return(day_in_heatwave)
}
This code might be wrong. But the idea is that the function takes as input a vector with the length as the data.frame has rows. And returns a vector of the same length.
my idea is to have a function that I can use like this:
df = data.frame(
temps = c(30,30,32,30,24)
)
df %>% mutate(is_heatwave = is_heatwave(temps))
I just wanted to ask if this generally is a good idea or are there any better ideas?
Already good answers, so let's add some nuances.
This solution gives an unique streak_id
that may or may not be a heat_wave
. hot_days_acc
is the number of hot days accumulated on a streak.
The code:
# library(tidyverse)
# -------------------
# Number of days in a heat wave
heat_wave_days <- 3
# Temperature threshold
hot_day <- 30
# Some toy data
set.seed(100)
aux_df <- tibble(temp = sample(-2:2 + hot_day, 50, replace = TRUE))
#
aux_df <- aux_df %>%
mutate(
hot_days_acc = if_else(temp >= hot_day, TRUE, FALSE),
streak_id = consecutive_id(hot_days_acc)) %>%
add_count(streak_id, name = "heat_wave") %>%
mutate(
.by = streak_id,
heat_wave = if_else(
all(hot_days_acc == TRUE) & heat_wave >= heat_wave_days,
TRUE, FALSE)) %>%
mutate(streak_id = consecutive_id(heat_wave)) %>%
mutate(.by = streak_id, hot_days_acc = cumsum(hot_days_acc)) %>%
relocate(temp, streak_id, heat_wave, hot_days_acc)
The output:
> print(aux_df, n = nrow(aux_df))
# A tibble: 50 × 4
temp streak_id heat_wave hot_days_acc
<dbl> <int> <lgl> <int>
1 29 1 FALSE 0
2 30 1 FALSE 1
3 28 1 FALSE 1
4 29 1 FALSE 1
5 31 1 FALSE 2
6 31 1 FALSE 3
7 29 1 FALSE 3
8 30 1 FALSE 4
9 29 1 FALSE 4
10 32 2 TRUE 1
11 31 2 TRUE 2
12 30 2 TRUE 3
13 30 2 TRUE 4
14 29 3 FALSE 0
15 28 3 FALSE 0
16 29 3 FALSE 0
17 30 4 TRUE 1
18 31 4 TRUE 2
19 31 4 TRUE 3
20 31 4 TRUE 4
21 32 4 TRUE 5
22 30 4 TRUE 6
23 28 5 FALSE 0
24 30 5 FALSE 1
25 31 5 FALSE 2
26 29 5 FALSE 2
27 32 6 TRUE 1
28 32 6 TRUE 2
29 32 6 TRUE 3
30 28 7 FALSE 0
31 32 8 TRUE 1
32 31 8 TRUE 2
33 30 8 TRUE 3
34 28 9 FALSE 0
35 28 9 FALSE 0
36 28 9 FALSE 0
37 30 9 FALSE 1
38 28 9 FALSE 1
39 28 9 FALSE 1
40 31 10 TRUE 1
41 30 10 TRUE 2
42 32 10 TRUE 3
43 30 10 TRUE 4
44 31 10 TRUE 5
45 30 10 TRUE 6
46 30 10 TRUE 7
47 30 10 TRUE 8
48 31 10 TRUE 9
49 30 10 TRUE 10
50 32 10 TRUE 11