rdplyrsequencerep

Create sequence of repeated values, with length based on a vector


How can I populate column ‘Night’ with a sequence of numbers, each repeating 3 times, and with the sequence restarting based on column ‘Site’? I’ve created a table showing what I want to achieve. This is a simplified version of my issue, I need to be able to use the code on a much larger dataframe.

Image of table

Site_date_time Site Night
1_01012023_2200 1 1
1_01012023_2300 1 1
1_02012023_0000 1 1
1_02012023_2200 1 2
1_02012023_2300 1 2
1_03012023_0000 1 2
2_01012023_2100 2 1
2_01012023_2200 2 1
2_01012023_2300 2 1
2_02012023_2200 2 2
2_02012023_2300 2 2
2_03012023_0000 2 2
2_03012023_2200 2 3
2_03012023_2300 2 3
2_04012023_0000 2 3
#Code to create basic data frame of Site
site <- c(rep(1,times=6), rep(2,times=9))
df <- data.frame(site)

My main issue is the length of the sequence of numbers before restarting the sequence varies (i.e. the number of records for each site varies). I could use the following if the number of rows for a given site was the same.

library("dplyr")
library("data.table")

# Create data frame of the site vector, with the number of observations per site of equal length
site <- c(rep(1,times=6), rep(2,times=6))
df <- data.frame(site)
# Create sequence with repeated numbers 
group_by(df,site) %>% mutate(night = rep(c(1:3), each=3))

But I need a function that allows me to create a sequence with repeated numbers based on the length of my grouped vector, rather than a defined length. I've tried to find a way of combining rep() with seq_along() or rowid(), but have had no luck.


Solution

  • You can use the length.out argument of rep(). From the docs:

    length.out: non-negative integer. The desired length of the output vector. Other inputs will be coerced to a double vector and the first element taken. Ignored if NA or invalid.

    The length of your grouped vector can be calculated with dplyr::n().

    library(dplyr)
    
    df |>
        mutate(night = rep(seq_len(n()), each = 3, length.out = n()), .by = site)
    #    site night
    # 1     1     1
    # 2     1     1
    # 3     1     1
    # 4     1     2
    # 5     1     2
    # 6     1     2
    # 7     2     1
    # 8     2     1
    # 9     2     1
    # 10    2     2
    # 11    2     2
    # 12    2     2
    # 13    2     3
    # 14    2     3
    # 15    2     3
    

    Also, as you included library(data.table) in your question, if df is a data.table you can use the same approach with the data.table syntax, using .N rather than n():

    df[, night := rep(seq_len(.N), each = 3, length.out = .N), site]