rlogistic-regressionmlogitmulticollinearity

Remove linear dependent variables while using the bife package


Some pre-programmed models automatically remove linear dependent variables in their regression output (e.g. lm()) in R. With the bife package, this does not seem to be possible. As stated in the package description in CRAN on page 5:

If bife does not converge this is usually a sign of linear dependence between one or more regressors and the fixed effects. In this case, you should carefully inspect your model specification.

Now, suppose the problem at hand involves doing many regressions and one cannot inspect adequately each regression output -- one has to suppose some sort of rule-of-thumb regarding the regressors. What could be some of the alternatives to remove linear dependent regressors more or less automatically and achieve an adequate model specification?

I set a code as an example below:

#sample coding

x=10*rnorm(40)
z=100*rnorm(40)

df1=data.frame(a=rep(c(0,1),times=20), x=x, y=x, z=z, ID=c(1:40), date=1, Region=rep(c(1,2, 3, 4),10))
df2=data.frame(a=c(rep(c(1,0),times=15),rep(c(0,1),times=5)), x=1.4*x+4, y=1.4*x+4, z=1.2*z+5, ID=c(1:40), date=2, Region=rep(c(1,2,3,4),10))
df3=rbind(df1,df2)

df3=rbind(df1,df2)

for(i in 1:4) {
  
  x=df3[df3$Region==i,]
  
  model =  bife::bife(a ~ x + y + z | ID, data = x)
  
  results=data.frame(Region=unique(df3$Region))
  
  results$Model = results

  if (i==1){
      df4=df
      next
  }

df4=rbind(df4,df)

  
} 

Error: Linear dependent terms detected!

Solution

  • Since you're only looking at linear dependencies, you could simply leverage methods that detect them, like for instance lm.

    Here's an example of solution with the package fixest:

    library(bife)
    library(fixest)
    
    x = 10*rnorm(40)
    z = 100*rnorm(40)
    
    df1 = data.frame(a=rep(c(0,1),times=20), x=x, y=x, z=z, ID=c(1:40), date=1, Region=rep(c(1,2, 3, 4),10))
    
    df2 = data.frame(a=c(rep(c(1,0),times=15),rep(c(0,1),times=5)), x=1.4*x+4, y=1.4*x+4, z=1.2*z+5, ID=c(1:40), date=2, Region=rep(c(1,2,3,4),10))
    
    df3 = rbind(df1, df2)
    
    vars = c("x", "y", "z")
    
    res_all = list()
    for(i in 1:4) {
        x = df3[df3$Region == i, ]
    
        coll_vars = feols(a ~ x + y + z | ID, x, notes = FALSE)$collin.var
        new_fml = xpd(a ~ ..vars | ID, ..vars = setdiff(vars, coll_vars))
        res_all[[i]] = bife::bife(new_fml, data = x)
    }
    
    # Display all results
    for(i in 1:4) {
        cat("\n#\n# Region: ", i, "\n#\n\n")
        print(summary(res_all[[i]]))
    }
    

    The functions needed here are feols and xpd, the two are from fixest. Some explanations:

    So you get an algorithm with automatic variable removal before performing bife estimations.

    Finally, just a side comment: in general it's better to store results in lists since it avoids copies.

    Update

    I forgot, but if you don't need bias correction (bife::bias_corr), then you can directly use fixest::feglm which automatically removes collinear variables:

    res_bife = bife::bife(a ~ x + z | ID, data = df3)
    res_feglm = fixest::feglm(a ~ x + y + z | ID, df3, family = binomial)
    
    rbind(coef(res_bife), coef(res_feglm))
    #>                x          z
    #> [1,] -0.02221848 0.03045968
    #> [2,] -0.02221871 0.03045990