I have stumbled onto a problem I haven't been able to solve. I have a person in two different categories, and I need to count them as 0.5 in each category.
Here is the sample data:
Notice that ID Number 18 has a slash indicating that the two departments. I need the output to read as follows.
My first thought was to split the DEPT column, but I am not sure where to go from there since the two rows are identical. My code is as follows. Any suggestions?
Grad_Applied_Formatted1 <- Grad_AppAccMat %>%
separate_wider_delim(DEPT,"/", names= c("DEPT1","DEPT2"),
too_few = "align_start",cols_remove = F)
Using strsplit
with row_number
library(dplyr)
df %>%
mutate(DEPT = strsplit(DEPT, "/")[[1]][row_number()], .by = ID)
ID DEPT COUNT
1 1 BIO 1.0
2 2 EDU 1.0
3 3 PHYS 1.0
4 4 MAR 1.0
5 5 SPA 1.0
6 6 FRE 1.0
7 7 KWL 1.0
8 8 QED 1.0
9 9 XYZ 1.0
10 10 UNI 1.0
11 11 RED 1.0
12 12 KJH 1.0
13 13 LMS 1.0
14 14 OPU 1.0
15 15 RTY 1.0
16 16 GHF 1.0
17 17 VZX 1.0
18 18 EPI 0.5
19 18 ENH 0.5
Or a base R approach
transform(df, DEPT = unlist(sapply(split(df, df$ID), \(x) {
res <- strsplit(x$DEPT, "/")
unlist(res)[seq_len(length(res))]})))
ID DEPT COUNT
1 1 BIO 1.0
2 2 EDU 1.0
3 3 PHYS 1.0
4 4 MAR 1.0
5 5 SPA 1.0
6 6 FRE 1.0
7 7 KWL 1.0
8 8 QED 1.0
9 9 XYZ 1.0
10 10 UNI 1.0
11 11 RED 1.0
12 12 KJH 1.0
13 13 LMS 1.0
14 14 OPU 1.0
15 15 RTY 1.0
16 16 GHF 1.0
17 17 VZX 1.0
18 18 EPI 0.5
19 18 ENH 0.5
df <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 18L), DEPT = c("BIO",
"EDU", "PHYS", "MAR", "SPA", "FRE", "KWL", "QED", "XYZ", "UNI",
"RED", "KJH", "LMS", "OPU", "RTY", "GHF", "VZX", "EPI/ENH", "EPI/ENH"
), COUNT = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 0.5, 0.5)), class = "data.frame", row.names = c(NA, -19L))