I was trying to obtain the expected utility for each individual using R's survival
package (clogit
function) and I was not able to find a simple solution such as mlogit's logsum
.
Below I set an example of how one would do it using the mlogit
package. It is pretty straight forward: it just requires regressing the variables with the mlogit
function, save the output and use it as an argument in the logsum
function -- if needed, there is a short explanation in this vignette. And what I want is to know the similar method for clogit
. I've read the package's manual but I have failed to grasp what would be the most adequate function to perform the analsysis.
Note1: My preference for a function like mlogit's
is related to the fact that I might need to perform tons of regressions later on and being able to perform the correct estimation in different scenarios would be helpful.
Note2: I do not intend that the dataset created below be representative of how data should behave. I've set the example solely for the purpose of perfoming the function after the logit regressions.
**
library(survival)
library(mlogit)
#creating a dataset
df_test=data.frame(id=rep(1:20,each=4),
choice=rep(c("train","car","plane","boat")),
distance=c(rnorm(80)*10),
)
f=function(x,y,z) {
v=round(rnorm(x,y,z))
while(sum(v)>1 | sum(v)==0) {
v=round(rnorm(x,y,z))
}
return(v)
}
result1=c()
for (i in 1:20) {
result=f(4,0.5,0.1)
result1=c(result,result1)
}
df_test$distance=ifelse(df_test$distance<0,df_test$distance*-1,df_test$distance)
df_test$price = 0
df_test$price[df_test$choice=="plane"] = rnorm(20, mean = 300, sd=30)
df_test$price[df_test$choice=="car"] = rnorm(20, mean = 50, sd=10)
df_test$price[df_test$choice=="boat"] = rnorm(20, mean = 100, sd=15)
df_test$price[df_test$choice=="train"] = rnorm(20, mean = 120, sd=25)
df_test$choice2=result1
mlog=mlogit(choice2 ~ distance + price , data = df_test)
#the function logsum generates expected utility for each individual
logsum(mlog)
#so what would be adequate alternative with survival's clogit? I set an exemple below of
#of what i would like to regress and then perform something like logsum()
clog=clogit(choice2 ~ distance + price + as.factor(choice), strata(id), data = df_test)
**
The vignette you offer says the logsum is calculated as:
To my reading that is similar to the calculation used to construct the "linear predictor". the lp is t(coef(clog)) %*% Xhat
. If I'm correct on that interpretation, then that is stored in the clog
-object:
clog["linear.predictor"]
So you could just take:
log(sum( exp(clog[["linear.predictors"]]) ))
[1] 4.286592
If you need to segregate this by df_test$choice
then (since the length of the 'linear.predictors' element of clog
is the same as the length of the 'choice' column of df_test
) it's just:`
tapply(clog$linear.predictors, df_test$choice, function(x){
log(sum( exp(x) ))})
#--------------------------------------
boat car plane train
2.824502 3.506756 2.825004 1.734258
If you need to aggregate this by id
, you would do it like this:
tapply(clog$linear.predictors, df_test$id, function(x){
log(sum( exp(x) ))})
1 2 3 4 5 6 7 8 9
1.405896 1.506152 1.454507 1.539060 1.467082 1.428482 1.393582 1.521877 1.480670
10 11 12 13 14 15 16 17 18
1.466490 1.416338 1.500581 1.528075 1.488253 1.398405 1.445014 1.483623 1.404041
19 20
1.460672 1.452709