rregressionlinear-regressionlmmlm

What if I want a single linear regression model rather than an "mlm"?


I have shared the top 9 rows of the data I am working on in the image below (y0 to y6 are outputs, rest are inputs):

My objective is to get fitted output data for y0 to y6.

I tried lm function in R using the commands:

lm1 <- lm(cbind(y0, y1, y2, y3, y4, y5, y6) ~ tt + tcb + s + l + b, data = table3)
summary(lm1)

And it has returned 7 sets of coefficients like "Response y0", "Response y1", etc.

What I really want is just 1 set of coefficients which can predict values for outputs y0 to y6.

Could you please help in this?


Solution

  • By cbind(y0, y1, y2, y3, y4, y5, y6) we fit 7 independent models (which is be a better idea).

    For what you are looking for, stack your y* variables, replicate other independent variables and do a single regression.

    Y <- c(y0, y1, y2, y3, y4, y5, y6)
    tt. <- rep(tt, times = 7)
    tcb. <- rep(tcb, times = 7)
    s. <- rep(s, times = 7)
    l. <- rep(l, times = 7)
    b. <- rep(b, times = 7)
    
    fit <- lm(Y ~ tt. + tcb. + s. + l. + b.)
    

    Predicted values for y* are

    matrix(fitted(fit), ncol = 7)
    

    For other readers than OP

    I hereby prepare a tiny reproducible example (with only one covariate x and two replicates y1, y2) to help you digest the issue.

    set.seed(0)
    dat_wide <- data.frame(x = round(runif(4), 2),
                           y1 = round(runif(4), 2),
                           y2 = round(runif(4), 2))
    #     x   y1   y2
    #1 0.90 0.91 0.66
    #2 0.27 0.20 0.63
    #3 0.37 0.90 0.06
    #4 0.57 0.94 0.21
    
    ## The original "mlm"
    fit_mlm <- lm(cbind(y1, y2) ~ x, data = dat_wide)
    

    Instead of doing c(y1, y2) and rep(x, times = 2), I would use the reshape function from R base package stats, as such operation is essentially a "wide" to "long" dataset reshaping.

    dat_long <- stats::reshape(dat_wide,  ## wide dataset
                               varying = 2:3,  ## columns 2:3 are replicates
                               v.names = "y",  ## the stacked variable is called "y"
                               direction = "long"  ## reshape to "long" format
                               )
    #       x time    y id
    #1.1 0.90    1 0.91  1
    #2.1 0.27    1 0.20  2
    #3.1 0.37    1 0.90  3
    #4.1 0.57    1 0.94  4
    #1.2 0.90    2 0.66  1
    #2.2 0.27    2 0.63  2
    #3.2 0.37    2 0.06  3
    #4.2 0.57    2 0.21  4
    

    Extra variables time and id are created. The former tells which replicate a case comes from; the latter tells which record that case is within a replicate.

    To fit the same model for all replicates, we do

    fit1 <- lm(y ~ x, data = dat_long)
    #(Intercept)            x  
    #     0.2578       0.5801  
    
    matrix(fitted(fit1), ncol = 2)  ## there are two replicates
    #          [,1]      [,2]
    #[1,] 0.7798257 0.7798257
    #[2,] 0.4143822 0.4143822
    #[3,] 0.4723891 0.4723891
    #[4,] 0.5884029 0.5884029
    

    Don't be surprised that two columns are identical; there is only a single set of regression coefficients for both replicates after all.

    If you think carefully, we can do the following instead:

    dat_wide$ymean <- rowMeans(dat_wide[2:3])  ## average all replicates
    fit2 <- lm(ymean ~ x, data = dat_wide)
    #(Intercept)            x  
    #     0.2578       0.5801  
    

    and we will get the same point estimates. Standard errors and other summary statistics would differ as two models have different sample size.

    coef(summary(fit1))
    #             Estimate Std. Error   t value  Pr(>|t|)
    #(Intercept) 0.2577636  0.2998382 0.8596755 0.4229808
    #x           0.5800691  0.5171354 1.1216967 0.3048657
    
    coef(summary(fit2))
    #             Estimate Std. Error  t value    Pr(>|t|)
    #(Intercept) 0.2577636 0.01385864 18.59949 0.002878193
    #x           0.5800691 0.02390220 24.26844 0.001693604