I am trying to replicate this excel formula:
Excel
=IF(AND(var1[i+1]=var1[i],var2[i+1]=var2[i],var3[i+1]=var3[i]),0,1)
Basically I need to identify if row[i+1] is equal to row[i] using three (3) variables: "periodo_ocurr", "tipo_riesgo" and "paciente". If row[i+1] is equal to row[i] then 0 else 1
The code in R:
for(i in 1 : nrow(data_ocurr_2019))
{data_ocurr_2019$Flag_Pac[i+1] <- ifelse(data_ocurr_2019$periodo_ocurr[i+1] == data_ocurr_2019$periodo_ocurr[i] &
data_ocurr_2019$tipo_riesgo[i+1] == data_ocurr_2019$tipo_riesgo[i] &
data_ocurr_2019$paciente[i+1] == data_ocurr_2019$paciente[i], 0, 1)}
What is the issue? my data is over 1 million records and it takes a long time to execute it
Is there another way to do it? more efficient?
Any help will be appreciated
Try:
data_ocurr_2019$Flag_Pac = ifelse(lead(data_ocurr_2019$periodo_ocurr) == data_ocurr_2019$periodo_ocurr &
lead(data_ocurr_2019$tipo_riesgo) == data_ocurr_2019$tipo_riesgo &
lead(data_ocurr_2019$paciente) == data_ocurr_2019$paciente, 0, 1)
This should be much faster, since it relies on a vectorized calculation, so R can compile the code once and apply it to all rows at once.
More explanation of vectorization in R: https://www.noamross.net/archives/2014-04-16-vectorization-in-r-why/