rexcelperformancerowliterate-programming

Identify equal values in rows


I am trying to replicate this excel formula:

enter image description here

Excel

 =IF(AND(var1[i+1]=var1[i],var2[i+1]=var2[i],var3[i+1]=var3[i]),0,1)

Basically I need to identify if row[i+1] is equal to row[i] using three (3) variables: "periodo_ocurr", "tipo_riesgo" and "paciente". If row[i+1] is equal to row[i] then 0 else 1

The code in R:

for(i in 1 : nrow(data_ocurr_2019))

{data_ocurr_2019$Flag_Pac[i+1] <- ifelse(data_ocurr_2019$periodo_ocurr[i+1] == data_ocurr_2019$periodo_ocurr[i] & 
                                       data_ocurr_2019$tipo_riesgo[i+1] == data_ocurr_2019$tipo_riesgo[i] &
                                       data_ocurr_2019$paciente[i+1] == data_ocurr_2019$paciente[i], 0, 1)}

What is the issue? my data is over 1 million records and it takes a long time to execute it

Is there another way to do it? more efficient?

Any help will be appreciated


Solution

  • Try:

    data_ocurr_2019$Flag_Pac = ifelse(lead(data_ocurr_2019$periodo_ocurr) == data_ocurr_2019$periodo_ocurr & 
                                      lead(data_ocurr_2019$tipo_riesgo) == data_ocurr_2019$tipo_riesgo &
                                      lead(data_ocurr_2019$paciente) == data_ocurr_2019$paciente, 0, 1)
    

    This should be much faster, since it relies on a vectorized calculation, so R can compile the code once and apply it to all rows at once.

    More explanation of vectorization in R: https://www.noamross.net/archives/2014-04-16-vectorization-in-r-why/