I would like to know if there is an "easy/quick" way to convert character variables to factor.
I am aware, that one could make a vector with the column names and then use lapply
. However, I am working with a large data frame with more than 200 variables, so it would be preferable not having to write the 200+ names in the vector.
I am also aware that I can coerce the entire data frame by using lapply
, type.convert
and sapply
, but as I am working with time series data where some is categorical, and some is numerical, I am not interested in that either.
Is there any way to use the column number in this? I.e. [ ,2:200]
? I tried the following, but without any luck:
df[ ,2:30] <- lapply(df[ ,2:30], type.convert)
sapply(df, factor)
With the solution above, I would still have to do multiple of them, but it would still be quicker than writing all the variable names.
I also have a feeling a loop might be usable here, but I would not be sure of how to write it out, or if it is even a way to do it.
As you write, that you need to convert (all?) character variables to factors, you could use mutate_if
from dplyr
library(dplyr)
mutate_if(df, is.character, as.factor)
With this you only operate on columns for which is.character
returns TRUE
, so you don't need to worry about the column positions or names.