rmachine-learningregressiondata-analysisregression-testing

Categorical numerical variable into continuous form for regression problem


I have a dataset in which all columns are numerical.Some of the columns have categories in numerical form having levels >= 2. Do i need to convert that categorical numerical column into factor for regression analysis or not ? Please suggest any better approach in R.


Solution

  • Yes you do. You can prove it to yourself...

    x <- rep(1:5, 20)
    y <- rnorm(100)
    
    # not converting to factors
    m1 <- lm (y ~ x)
    
    # converting to factors
    m2 <- lm(y ~ as.factor(x) )
    
    summary(m1) # one fitted coefficent
    summary(m2) # five fitted coefficients