rdplyrlong-integerwidechar

is there an R code for the following data wrangling and transformation


I have the following data set

id<-c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4)
s02<-c(001,002,003,004,001,002,003,004,005,001,002,003,004,005,006,007,001,002,003,004,005,006,007,008,009,010,011,012,013,014,015,016,017,018,019,020,021,022,023,024,025,026,027,028,029)
dat1<-data.frame(id,s02)

I would wish to create a data set based on this dat1. I would wish to have an R code that creates n s02 automatically as s02__0, s02__1, s02__2, s02__3, s02__4, in which case my n==5. Then based on the ID in dat1, the code should allocate each s02 to the respective s02__0 to s02__4 in the data frame. These rows are uniquely identified by another ID_2 created based on the number of rows. If incase the s02 are less in the row created, then the remaining cells should be allocated ##N/A##. if the s02 are more than the n, then another new row with an increment from the unique ID_2 is formed to accommodate the extra s02 and every blank cell is still filled with ##N/A##. From the dataset above, I would wish to have the following output

id<-c(1,2,3,3,4,4,4,4,4,4)
id_2<-c(1,1,1,2,1,2,3,4,5,6)
s02__0<-c(1,1,1,6,1,6,11,16,21,26)
s02__1<-c(2,2,2,7,2,7,12,17,22,27)
s02__2<-c(3,3,3,##N/A##,3,8,13,18,23,28)
s02__3<-c(4,4,4,##N/A##,4,9,14,19,24,29)
s02__4<-c(##N/A##,5,5,##N/A##,5,10,15,20,25,##N/A##)

dat2<-data.frame(id,id_2,s02__0,s02__1,s02__2,s02__3,s02__4)

Solution

  • This can produce what you want:

    library(tidyverse)
    #Data
    id<-c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3)
    s02<-c(001,002,003,004,001,002,003,004,005,001,002,003,004,005,006,007)
    dat1<-data.frame(id,s02)
    #Code
    dat2 <- dat1 %>% group_by(id) %>% mutate(id2 = ifelse(s02<=5,1,2)) %>% ungroup() %>%
      group_by(id,id2) %>% mutate(val=1:n()-1,nid = cur_group_id()) %>% ungroup() %>%
      select(-id2) %>% mutate(id=paste0(id,'.',nid),val=paste0('s02','.',val)) %>% select(-nid) %>%
      pivot_wider(names_from = c(val),values_from = s02) %>%
      mutate(id=gsub("\\..*","", id)) %>% group_by(id) %>%
      mutate(id2=1:n()) %>% select(order(colnames(.)))
    dat2
    
    # A tibble: 4 x 7
    # Groups:   id [3]
      id      id2 s02.0 s02.1 s02.2 s02.3 s02.4
      <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
    1 1         1     1     2     3     4    NA
    2 2         1     1     2     3     4     5
    3 3         1     1     2     3     4     5
    4 3         2     6     7    NA    NA    NA