rdplyrtidyrdata-cleaningicd

Expand ICD10 codes from one row into multiple rows


I have a dataset like this:

ICD_10 diagnosis
A00 Cholera
A01-A03 Other Intestinal infectious diseases
A15 Respiratory tuberculosis
A17-A19 Other tuberculosis

...

On row 2 and 4, there are multiple ICD-10 codes, and I want to expand them into multiple rows, like below:

ICD_10 diagnosis
A00 Cholera
A01 Other Intestinal infectious diseases
A02 Other Intestinal infectious diseases
A03 Other Intestinal infectious diseases
A15 Respiratory tuberculosis
A17 Other tuberculosis
A18 Other tuberculosis
A19 Other tuberculosis

How can I accomplish this in R using tidyverse?

Thanks for your help!


Solution

  • fun <- function(vec) {
      ltr <- substring(vec, 1, 1)
      L <- lapply(strsplit(gsub("[^-0-9]", "", vec), "-"), as.integer)
      mapply(function(ltr, z) sprintf("%s%02i", ltr, if (length(z) > 1) seq(z[1], z[2]) else z),
             ltr, L)
    }
    quux %>%
      mutate(ICD_10 = fun(ICD_10)) %>%
      tidyr::unnest(ICD_10)
    # # A tibble: 8 x 2
    #   ICD_10 diagnosis                           
    #   <chr>  <chr>                               
    # 1 A00    Cholera                             
    # 2 A01    Other Intestinal infectious diseases
    # 3 A02    Other Intestinal infectious diseases
    # 4 A03    Other Intestinal infectious diseases
    # 5 A15    Respiratory tuberculosis            
    # 6 A17    Other tuberculosis                  
    # 7 A18    Other tuberculosis                  
    # 8 A19    Other tuberculosis                  
    

    Data

    quux <- structure(list(ICD_10 = c("A00", "A01-A03", "A15", "A17-A19"), diagnosis = c("Cholera", "Other Intestinal infectious diseases", "Respiratory tuberculosis", "Other tuberculosis")), class = "data.frame", row.names = c(NA, -4L))