rfor-loopsubsetiris-dataset

How to use subset() in a for loop in R


I need to select the levels of Species in the dataset Iris (available in R) with the function subset() and calculate the mean of the column Petal.Length from the same dataset, everything with a for loop. I know that I can do this calculations with the function tappy, but the task consists in using a for loop.

I tried writing a vector in which I would put the results:

medie <- rep(NA,3)
names(medie) <- levels(iris$Species)

and then this as the loop:

  for (i in 1:length(medie)){
    medie[i] <- mean(subset(iris, Species==levels(Species))$Petal.Length)
  }

but this are the results I get:

> medie
    setosa versicolor  virginica 
     3.796      3.796      3.796

Any help?


Solution

  • I think you need to include i in levels(Species)[i]

    for (i in 1:length(medie)){
      medie[i] <- mean(subset(iris, Species==levels(Species)[i])$Petal.Length)
    }
    
    > medie
        setosa versicolor  virginica 
         1.462      4.260      5.552 
    

    There is an argument called select in subset to select your target column, so you can use:

    medie[i] <- mean(subset(iris, Species==levels(Species)[i], select = "Petal.Length"))
    

    Here's a dplyr approach if you, someday, want to avoid for loop.

    library(dplyr)
    iris %>% 
      group_by(Species) %>% 
      summarise(medie = mean(Petal.Length))