rmlogit

Similar function to R's logsum function (mlogit package) for Survival package


I was trying to obtain the expected utility for each individual using R's survival package (clogit function) and I was not able to find a simple solution such as mlogit's logsum.

Below I set an example of how one would do it using the mlogit package. It is pretty straight forward: it just requires regressing the variables with the mlogit function, save the output and use it as an argument in the logsum function -- if needed, there is a short explanation in this vignette. And what I want is to know the similar method for clogit. I've read the package's manual but I have failed to grasp what would be the most adequate function to perform the analsysis.

Note1: My preference for a function like mlogit's is related to the fact that I might need to perform tons of regressions later on and being able to perform the correct estimation in different scenarios would be helpful.

Note2: I do not intend that the dataset created below be representative of how data should behave. I've set the example solely for the purpose of perfoming the function after the logit regressions.

**

library(survival)
library(mlogit)

#creating a dataset

df_test=data.frame(id=rep(1:20,each=4),
                   choice=rep(c("train","car","plane","boat")),
                   distance=c(rnorm(80)*10),
                   )

f=function(x,y,z) {
    
  v=round(rnorm(x,y,z))
    
    while(sum(v)>1 | sum(v)==0) {
      
      v=round(rnorm(x,y,z))
      
    }
  
return(v)
    
}

result1=c()

for (i in 1:20) {
  
  result=f(4,0.5,0.1)
  
  result1=c(result,result1)
  
}

df_test$distance=ifelse(df_test$distance<0,df_test$distance*-1,df_test$distance)
df_test$price = 0
df_test$price[df_test$choice=="plane"] = rnorm(20, mean = 300, sd=30)
df_test$price[df_test$choice=="car"] = rnorm(20, mean = 50, sd=10)
df_test$price[df_test$choice=="boat"] = rnorm(20, mean = 100, sd=15)
df_test$price[df_test$choice=="train"] = rnorm(20, mean = 120, sd=25)

df_test$choice2=result1
           
mlog=mlogit(choice2 ~ distance + price , data = df_test)

#the function logsum generates expected utility for each individual

logsum(mlog)

#so what would be adequate alternative with survival's clogit? I set an exemple below of
#of what i would like to regress and then perform something like logsum()

clog=clogit(choice2 ~ distance + price + as.factor(choice), strata(id), data = df_test)

**


Solution

  • The vignette you offer says the logsum is calculated as:

    enter image description here

    To my reading that is similar to the calculation used to construct the "linear predictor". the lp is t(coef(clog)) %*% Xhat. If I'm correct on that interpretation, then that is stored in the clog-object:

     clog["linear.predictor"]
    

    So you could just take:

    log(sum( exp(clog[["linear.predictors"]]) ))
    [1] 4.286592
    

    If you need to segregate this by df_test$choice then (since the length of the 'linear.predictors' element of clog is the same as the length of the 'choice' column of df_test) it's just:`

     tapply(clog$linear.predictors, df_test$choice, function(x){ 
                   log(sum( exp(x) ))})
     #--------------------------------------
        boat      car    plane    train 
    2.824502 3.506756 2.825004 1.734258 
    

    If you need to aggregate this by id, you would do it like this:

    tapply(clog$linear.predictors, df_test$id, function(x){ 
      log(sum( exp(x) ))})
    
           1        2        3        4        5        6        7        8        9 
    1.405896 1.506152 1.454507 1.539060 1.467082 1.428482 1.393582 1.521877 1.480670 
          10       11       12       13       14       15       16       17       18 
    1.466490 1.416338 1.500581 1.528075 1.488253 1.398405 1.445014 1.483623 1.404041 
          19       20 
    1.460672 1.452709