I saw in a tutorial about regression modeling the following command:
myFormula <- Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
What exactly does this command do, and what is the role of ~
(tilde) in the command?
The thing on the right of <-
is a formula
object. It is often used to denote a statistical model, where the thing on the left of the ~
is the response and the things on the right of the ~
are the explanatory variables. So in English you'd say something like "Species depends on Sepal Length, Sepal Width, Petal Length and Petal Width".
The myFormula <-
part of that line stores the formula in an object called myFormula
so you can use it in other parts of your R code.
Other common uses of formula objects in R
The lattice
package uses them to specify the variables to plot.
The ggplot2
package uses them to specify panels for plotting.
The dplyr
package uses them for non-standard evaulation.