rstatisticseconomicsmlogit

multinomial logit


I'm stuck with running a multinomial logit regression in R. The data preview is attached for the reference. How should I run it? I'm new to R, and need to do this for applied econometrics using R. Can you help me with reshaping data and running multinomial regression?

> head(data)
  marketindex x1_prod1 x2_prod1 x3_prod1 x1_prod2 x2_prod2 x3_prod2 x1_prod3 x2_prod3 x3_prod3 x1_prod0 x2_prod0 x3_prod0 choice
1           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      3
2           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      2
3           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      3
4           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      2
5           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      2
6           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      2

Solution

  • Running multinomial logit model in R can be done in several packages, including multinom package and mlogit package. The tutorial at UCLA website recommended by mhmtsrmn prefers multinom to mlogit

    because it does not require the data to be reshaped (as the mlogit package does)

    However, the data you provided have been in a shape compatible with the format required by mlogit package, so in case you want to use mlogit, you don't need reshaping anymore. Nevertheless, you do need to change the coding in the choice column as follows:

    This is necessary because in the other columns you use prod2, prod3, etc.

    I tried to run mlogit function to your data sample, but it failed, most probably because this sample doesn't have enough variation in its values. So I change the values to random values and assigned the data frame to choice_dat name, like this:

    choice_dat
     marketindex x1_prod1 x2_prod1 x3_prod1 x1_prod2 x2_prod2 x3_prod2 x1_prod3
    1           1        5        7        6        5        2        8        7
    2           1        8        3        5        6        3        9        8
    3           1        7       10        3        7        6        9        9
    4           1        8        8        2        5        8        9        7
    5           1        9        9       10        8        4        6        8
    6           1        7        4        8        7       10       10        8
      x2_prod3 x3_prod3 x1_prod0 x2_prod0 x3_prod0 choice1
    1       10       13        0        0        0   prod3
    2        3       10        0        0        0   prod2
    3        4       10        0        0        0   prod3
    4        1       11        0        0        0   prod2
    5        8       10        0        0        0   prod2
    6        5       12        0        0        0   prod2
    

    Then, I run mlogit to the data:

    prod_dat <- dfidx(choice_dat, choice = "choice1", varying = c(2:13), sep = "_")
    mod1<- mlogit(choice1 ~ x1 + x2 + x3|0, data = prod_dat)
    summary(mod1)
    
    Call:
    mlogit(formula = choice1 ~ x1 + x2 + x3 | 0, data = prod_dat, 
        method = "nr")
    
    Frequencies of alternatives:choice
      prod0   prod1   prod2   prod3 
    0.00000 0.00000 0.66667 0.33333 
    
    nr method
    5 iterations, 0h:0m:0s 
    g'(-H)^-1g = 9.53E-08 
    gradient close to zero 
    
    Coefficients :
       Estimate Std. Error z-value Pr(>|z|)
    x1 -0.11412    0.38947 -0.2930   0.7695
    x2  0.16461    0.17790  0.9253   0.3548
    x3  0.26768    0.22651  1.1818   0.2373
    
    Log-Likelihood: -5.8257