rdataframeseqrep

Repeating the value in a df column by a specified amount, and concatenating integer count to repeated values


I would like to use R to create an expanded_df from a template_df, where each row is repeated by a number of times specified in a separate column in the template_df, and an integer count is concatenated to the ID column in the expanded_df, specifying the number this row has been repeated in the expanded_df.

I would like this count to start at 600 for each ID class.

E.g., template_df:

Initial_ID  Count
a           2
b           3
c           1
d           4

expanded_df:

Expanded_ID
a-600
a-601
b-600
b-601
b-602
c-600
d-600
d-601
d-602
d-603

Anyone have any ideas? Thanks!


Solution

  • We may use uncount to expand the rows and then get the rowid (of the 'Initial_ID' to paste after adding 599

    library(dplyr)
    library(tidyr)
    library(data.table)
    library(stringr)
    template_df %>% 
       uncount(Count) %>% 
       transmute(Expanded_ID = str_c(Initial_ID, 599 + rowid(Initial_ID), sep = '-'))
    

    -output

     Expanded_ID
    1        a-600
    2        a-601
    3        b-600
    4        b-601
    5        b-602
    6        c-600
    7        d-600
    8        d-601
    9        d-602
    10       d-603
    

    Or using base R with rep and paste

    data.frame(Expanded_ID = with(template_df, paste0(rep(Initial_ID, Count), "-", 
           599 + sequence(Count))))
    

    -output

       Expanded_ID
    1        a-600
    2        a-601
    3        b-600
    4        b-601
    5        b-602
    6        c-600
    7        d-600
    8        d-601
    9        d-602
    10       d-603
    

    data

    template_df <- structure(list(Initial_ID = c("a", "b", "c", "d"), Count = c(2L, 
    3L, 1L, 4L)), class = "data.frame", row.names = c(NA, -4L))