How can I populate column ‘Night’ with a sequence of numbers, each repeating 3 times, and with the sequence restarting based on column ‘Site’? I’ve created a table showing what I want to achieve. This is a simplified version of my issue, I need to be able to use the code on a much larger dataframe.
Site_date_time | Site | Night |
---|---|---|
1_01012023_2200 | 1 | 1 |
1_01012023_2300 | 1 | 1 |
1_02012023_0000 | 1 | 1 |
1_02012023_2200 | 1 | 2 |
1_02012023_2300 | 1 | 2 |
1_03012023_0000 | 1 | 2 |
2_01012023_2100 | 2 | 1 |
2_01012023_2200 | 2 | 1 |
2_01012023_2300 | 2 | 1 |
2_02012023_2200 | 2 | 2 |
2_02012023_2300 | 2 | 2 |
2_03012023_0000 | 2 | 2 |
2_03012023_2200 | 2 | 3 |
2_03012023_2300 | 2 | 3 |
2_04012023_0000 | 2 | 3 |
#Code to create basic data frame of Site
site <- c(rep(1,times=6), rep(2,times=9))
df <- data.frame(site)
My main issue is the length of the sequence of numbers before restarting the sequence varies (i.e. the number of records for each site varies). I could use the following if the number of rows for a given site was the same.
library("dplyr")
library("data.table")
# Create data frame of the site vector, with the number of observations per site of equal length
site <- c(rep(1,times=6), rep(2,times=6))
df <- data.frame(site)
# Create sequence with repeated numbers
group_by(df,site) %>% mutate(night = rep(c(1:3), each=3))
But I need a function that allows me to create a sequence with repeated numbers based on the length of my grouped vector, rather than a defined length. I've tried to find a way of combining rep() with seq_along() or rowid(), but have had no luck.
You can use the length.out
argument of rep()
. From the docs:
length.out
: non-negative integer. The desired length of the output vector. Other inputs will be coerced to a double vector and the first element taken. Ignored if NA or invalid.
The length of your grouped vector can be calculated with dplyr::n()
.
library(dplyr)
df |>
mutate(night = rep(seq_len(n()), each = 3, length.out = n()), .by = site)
# site night
# 1 1 1
# 2 1 1
# 3 1 1
# 4 1 2
# 5 1 2
# 6 1 2
# 7 2 1
# 8 2 1
# 9 2 1
# 10 2 2
# 11 2 2
# 12 2 2
# 13 2 3
# 14 2 3
# 15 2 3
Also, as you included library(data.table)
in your question, if df
is a data.table
you can use the same approach with the data.table
syntax, using .N
rather than n()
:
df[, night := rep(seq_len(.N), each = 3, length.out = .N), site]