rlongitudinal

Remove single observations then duplicate remaining observations


For the following data, I'd like to do the following:

[1] remove observations with only one repeated measure / observation for that id

[2] if the remaining id's have fewer than 5 repeated measures, repeat those observations till there's at least 5 observations (for example if there are 2 rows for an id, duplicate them 3 times)

Data:

structure(list(id = c("0101", "0102", "0102", "0103", "0103", 
"0103", "0104", "0104", "0104", "0104", "0104", "0105", "0105", 
"0105", "0105", "0105", "0106", "0106", "0106", "0106", "0106", 
"0107", "0107", "0107", "0107", "0107", "0108", "0108", "0108", 
"0108"), date = c("10/01/91", "12/03/91", "05/05/92", "06/22/92", 
"12/17/92", "07/14/93", "07/28/92", "01/14/93", "08/11/93", "02/03/94", 
"08/23/94", "09/24/92", "03/05/93", "10/18/93", "04/14/94", "05/31/94", 
"01/13/93", "07/27/93", "03/10/94", "09/01/94", "03/09/95", "01/15/93", 
"07/23/93", "02/07/94", "07/28/94", "02/07/95", "03/19/93", "10/04/93", 
"05/17/94", "11/15/94"), y = c(0, 0, 9, 0, -11, -11, 0, 10, 9, 
4, 5, 0, -7, -17, -13, -17, 0, 6, 6, 1, 3, 0, -9, -13, -18, -17, 
0, -8, -8, -10)), row.names = c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 
23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L), class = "data.frame")

Solution

  • I've been explicit about the duplication so that it's always a multiple, even if that takes you above the exact 5 per group count. Using the data.table package:

    library(data.table)
    setDT(df)
    df[
        df[, if(.N == 1) NULL
             else if(.N < 5) rep(.I, (5 %/% .N) + 1)
             else .I,  by=id]$V1
    ]
    
    #        id     date     y
    #    <char>   <char> <num>
    # 1:   0102 12/03/91     0
    # 2:   0102 05/05/92     9
    # 3:   0102 12/03/91     0
    # 4:   0102 05/05/92     9
    # 5:   0102 12/03/91     0
    # 6:   0102 05/05/92     9
    # 7:   0103 06/22/92     0
    # 8:   0103 12/17/92   -11
    # 9:   0103 07/14/93   -11
    #10:   0103 06/22/92     0
    #11:   0103 12/17/92   -11
    #12:   0103 07/14/93   -11
    # ...