ricd

How to filter alphanumeric characters range?


I need to create dummy variables using ICD-10 codes. For example, chapter 2 starts with C00 and ends with D48X. Data looks like this:

data <- data.frame(LINHAA1 = c("B342", "C000", "D450", "0985"),
                   LINHAA2 = c("U071", "C99", "D68X", "J061"),
                   LINHAA3 = c("D48X", "Y098", "X223", "D640"))

Then I need to create a column that receives 1 if it's between the C00-D48X range and 0 if it's not. The result I desire:

LINHAA1   LINHAA2   LINHAA3  CHAPTER2
B342      U071      D48X         1
C000      C99       Y098         1
D450      D68X      X223         1
O985      J061      D640         0

It needs to go through LINHAA1 to LINHAA3. Thanks in advance!


Solution

  • This should do it:

    as.numeric(apply(apply(data, 1, 
        function(x) { x >="C00" & x <= "D48X" }), 2, any))
    [1] 1 1 1 0
    

    A little explanation: Checking if the codes are in the range can just be checked using alphabetic order (which you can get from <= etc). The inner apply checks each element and produces a matrix of logical values. The outer apply uses any to check if any one of the three logical values is true. as.numeric changes the result from TRUE/False to 1/0.