I have a data set with about 75,000 observations, which I would like to prepare a little bit in the first step.
For example I want to set a variable under a certain condition.
My classical approach now would be to iterate over the complete data set line by line. Check the condition in each line and then set the variable.
Is this the right approach especially with regard to the computing time?
for (row in 1:nrow(kader_test)) {
if (kader_test[row,]$saison <= kader_test[row,]$jahr_im_team_seit) {
kader_test[row,]$gespielt_von = kader_test[row,]$im_team_seit
}
}
Nach der FOR Schleife sieht man, dass sich in Zeile 1 und 6 etwas geändert hat. Gibt es hierfür einen eleganteren Weg?
Thank you.
I guess a good solution would be the dplyr package:
library(dplyr)
kader_test %>%
dplyr::mutate(gespielt_von = ifelse(saison <= jahr_im_team_seit, im_team_seit, NA))