rdataframe

One entity in more than one category, count half in each category


I have stumbled onto a problem I haven't been able to solve. I have a person in two different categories, and I need to count them as 0.5 in each category.

Here is the sample data:

enter image description here

Notice that ID Number 18 has a slash indicating that the two departments. I need the output to read as follows.

enter image description here

My first thought was to split the DEPT column, but I am not sure where to go from there since the two rows are identical. My code is as follows. Any suggestions?

Grad_Applied_Formatted1 <-  Grad_AppAccMat %>%
separate_wider_delim(DEPT,"/", names= c("DEPT1","DEPT2"), 
too_few = "align_start",cols_remove = F)

Solution

  • Using strsplit with row_number

    library(dplyr)
    
    df %>% 
      mutate(DEPT = strsplit(DEPT, "/")[[1]][row_number()], .by = ID)
       ID DEPT COUNT
    1   1  BIO   1.0
    2   2  EDU   1.0
    3   3 PHYS   1.0
    4   4  MAR   1.0
    5   5  SPA   1.0
    6   6  FRE   1.0
    7   7  KWL   1.0
    8   8  QED   1.0
    9   9  XYZ   1.0
    10 10  UNI   1.0
    11 11  RED   1.0
    12 12  KJH   1.0
    13 13  LMS   1.0
    14 14  OPU   1.0
    15 15  RTY   1.0
    16 16  GHF   1.0
    17 17  VZX   1.0
    18 18  EPI   0.5
    19 18  ENH   0.5
    

    Or a base R approach

    transform(df, DEPT = unlist(sapply(split(df, df$ID), \(x) {
                                  res <- strsplit(x$DEPT, "/")
                                  unlist(res)[seq_len(length(res))]})))
       ID DEPT COUNT
    1   1  BIO   1.0
    2   2  EDU   1.0
    3   3 PHYS   1.0
    4   4  MAR   1.0
    5   5  SPA   1.0
    6   6  FRE   1.0
    7   7  KWL   1.0
    8   8  QED   1.0
    9   9  XYZ   1.0
    10 10  UNI   1.0
    11 11  RED   1.0
    12 12  KJH   1.0
    13 13  LMS   1.0
    14 14  OPU   1.0
    15 15  RTY   1.0
    16 16  GHF   1.0
    17 17  VZX   1.0
    18 18  EPI   0.5
    19 18  ENH   0.5
    
    Data
    df <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
    11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 18L), DEPT = c("BIO",
    "EDU", "PHYS", "MAR", "SPA", "FRE", "KWL", "QED", "XYZ", "UNI",
    "RED", "KJH", "LMS", "OPU", "RTY", "GHF", "VZX", "EPI/ENH", "EPI/ENH"
    ), COUNT = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 0.5, 0.5)), class = "data.frame", row.names = c(NA, -19L))