rloopscpu-time

For loops are for beginners - how does it work better in R?


I have a data set with about 75,000 observations, which I would like to prepare a little bit in the first step.

For example I want to set a variable under a certain condition.

My classical approach now would be to iterate over the complete data set line by line. Check the condition in each line and then set the variable.

Is this the right approach especially with regard to the computing time?

INITIAL DATA Initial data

for (row in 1:nrow(kader_test)) {
  if (kader_test[row,]$saison <= kader_test[row,]$jahr_im_team_seit) {
    kader_test[row,]$gespielt_von = kader_test[row,]$im_team_seit
  }
}

Nach der FOR Schleife sieht man, dass sich in Zeile 1 und 6 etwas geändert hat. Gibt es hierfür einen eleganteren Weg?

RESULT Result

Thank you.


Solution

  • I guess a good solution would be the dplyr package:

    library(dplyr)
    
    kader_test %>%
      dplyr::mutate(gespielt_von = ifelse(saison <= jahr_im_team_seit, im_team_seit, NA))