rmulti-levelmultilevel-analysis

Grouping Variables in Multilevel Linear Models


I am trying to learn hierarchical models in R and I have generated some sample data for myself. I am having trouble with the correct syntax for coding a multilevel regression problem.

I generated some data for salaries in a Business school. I made the salaries depend linearly on the number of years of employment and the total number of publications by the faculty member. The faculty are in various departments and I made the base salary(intercept) different for each department and also the yearly hike(slopes) different for each department. This way, I have the intercept (base salary) and slope(w.r.t experience in number of years) of the salary depend on the nested level (department) and slope w.r.t another explanatory variable (Publications) not depend on the nested level. What would be the correct syntax to model this in R?

here's my data

    Data <-data.frame(Sl_No = c(1:40),
    +                   Dept = as.factor(sample(c("Mark","IT","Fin"),40,replace = TRUE)),
    +                   Years = round(runif(40,1,10)))

    pubs <-round(Data$Years*runif(40,1,3))
    Data$Pubs <- pubs

    lookup_table<-data.frame(Dept = c("Mark","IT","Fin","Strat","Ops"),
    +                          base = c(100000,140000,150000,150000,120000),
    +                          slope = c(6000,5000,3000,2000,4000))

  Data <- merge(Data,lookup_table,by = 'Dept')
  salary <-Data$base+Data$slope*Data$Years+Data$Pubs*10000+rnorm(length(Data$Dept))*10000

    Data$base<-NULL
    Data$slope<-NULL

I have tried the following:

1)

 multilevel_model<-lmer(Salary~1|Dept+Pubs+Years|Dept, data = Data)

Error in model.matrix.default(eval(substitute(~foo, list(foo = x[[2]]))), : model frame and formula mismatch in model.matrix()

2)

multilevel_model<-lmer(`Salary`~ Dept + `Pubs`+`Years`|Dept , data = Data)

boundary (singular) fit: see ?isSingular

I want to see the estimates of the salary intercept and yearly hike by Dept and the estimate of the effect of publication as a standalone (pooled). Right now I am not getting the code to work at all.

I know the base salary and the yearly hike by dept and the effect of a publication (since I generated it).

Dept    base    Slope
Fin     150000  3000 
Mark    100000  6000 
Ops     120000  4000
IT      140000  5000
Strat   150000  2000

Every publication increases the salary by 10,000.

ANSWER: Thanks to @Ben 's answer here I think the correct model is

    multilevel_model<-lmer(Salary~(1|Dept)+ Pubs +(0+Years|Dept), data = Data)

This gives me the following fixed effects by running

    summary(multilevel_model)

Fixed effects:
   Estimate Std. Error t value
(Intercept) 131667.4    10461.0   12.59
Pubs         10235.0      550.8   18.58
Correlation of Fixed Effects:
    Pubs -0.081

The Department level coefficients are as follows:

coef(multilevel_model)

$Dept
          Years (Intercept)     Pubs
Fin   3072.5133    148757.6 10235.02
IT    5156.6774    136710.7 10235.02
Mark  5435.8301    102858.3 10235.02
Ops   3433.1433    118287.1 10235.02
Strat  963.9366    151723.1 10235.02

These are pretty good estiamtes of the original values. Now I need to learn to assess "how good" they are. :)


Solution

  • (1)

    multilevel_model<-lmer(`Total Salary`~ 1|Dept + 
     `Publications`+`Years of Exp`|Dept , data = sample_data)
    

    I can't immediately diagnose why this gives a syntax error, but parentheses are generally recommended around random-effect terms because the | operator has high precedence in formulas. Thus the response/right-hand side (RHS) formula

    ~ (1|Dept) + (`Publications`+`Years of Exp`|Dept)
    

    might work, except that it would be problematic because both terms contain the same intercept term: if you wanted to do this you'd probably need

    ~ (1|Dept) + (0+`Publications`+`Years of Exp`|Dept)
    

    (2)

    ~ Dept + `Publications`+`Years of Exp`|Dept 
    

    It doesn't really make any sense to put the same variable (Dept) on both the left- and right-hand sides of the bar.


    You should probably use

    ~ pubs + years_exp + (1 + years_exp|Dept)
    

    Since in principle the effect of publication could vary across departments, the maximal model would be

    ~ pubs + years_exp + (1 + pubs + years_exp|Dept)
    

    The GLMM FAQ has more on model specification