I have a dataset like this:
ICD_10 | diagnosis |
---|---|
A00 | Cholera |
A01-A03 | Other Intestinal infectious diseases |
A15 | Respiratory tuberculosis |
A17-A19 | Other tuberculosis |
...
On row 2 and 4, there are multiple ICD-10 codes, and I want to expand them into multiple rows, like below:
ICD_10 | diagnosis |
---|---|
A00 | Cholera |
A01 | Other Intestinal infectious diseases |
A02 | Other Intestinal infectious diseases |
A03 | Other Intestinal infectious diseases |
A15 | Respiratory tuberculosis |
A17 | Other tuberculosis |
A18 | Other tuberculosis |
A19 | Other tuberculosis |
How can I accomplish this in R using tidyverse?
Thanks for your help!
fun <- function(vec) {
ltr <- substring(vec, 1, 1)
L <- lapply(strsplit(gsub("[^-0-9]", "", vec), "-"), as.integer)
mapply(function(ltr, z) sprintf("%s%02i", ltr, if (length(z) > 1) seq(z[1], z[2]) else z),
ltr, L)
}
quux %>%
mutate(ICD_10 = fun(ICD_10)) %>%
tidyr::unnest(ICD_10)
# # A tibble: 8 x 2
# ICD_10 diagnosis
# <chr> <chr>
# 1 A00 Cholera
# 2 A01 Other Intestinal infectious diseases
# 3 A02 Other Intestinal infectious diseases
# 4 A03 Other Intestinal infectious diseases
# 5 A15 Respiratory tuberculosis
# 6 A17 Other tuberculosis
# 7 A18 Other tuberculosis
# 8 A19 Other tuberculosis
Data
quux <- structure(list(ICD_10 = c("A00", "A01-A03", "A15", "A17-A19"), diagnosis = c("Cholera", "Other Intestinal infectious diseases", "Respiratory tuberculosis", "Other tuberculosis")), class = "data.frame", row.names = c(NA, -4L))