rggplot2scatter-plotpredictionmlogit

Graphical representation of a series of probabilities from logistic model with R


I want to make a graph of a series of predictions on a logit model in R. The model is as follows:

modelo_logit3 <- glm(formula = Sold ~ price+age+poor_prop+airport, data = datos, family = binomial)
summary(modelo_logit3)

Call:
glm(formula = Sold ~ price + age + poor_prop + airport, family = binomial, 
    data = datos)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.8327  -1.0676  -0.3743   1.0907   1.9014  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  4.275016   0.743781   5.748 9.05e-09 ***
price       -0.148547   0.021930  -6.774 1.26e-11 ***
age          0.009497   0.004592   2.068   0.0386 *  
poor_prop   -0.184504   0.029633  -6.226 4.78e-10 ***
airportYES   0.871132   0.200409   4.347 1.38e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 697.28  on 505  degrees of freedom
Residual deviance: 610.46  on 501  degrees of freedom
AIC: 620.46

Number of Fisher Scoring iterations: 4

I would like to represent in a scatter plot three series of probability of the variable Sold, based on three different values of price: 20, 30 and 40. The variables age and airport will have a constant value and poor_price is the variable that will vary. In the plot, the Y axis will represent the probabilities and the X axis the poor_price variable. What I have done is the following:

# Let's make the predictions and save them in variables to use them later:
a = predict(modelo_logit3, newdata = data.frame(price=20, age=50, 
                                            poor_prop=c(5,25,35,50,65), 
                                            airport= 'YES'), type ="response")

b = predict(modelo_logit3, newdata = data.frame(price=30, age=50, 
                                            poor_prop=c(5,25,35,50,65), 
                                            airport= 'YES'), type ="response")

c = predict(modelo_logit3, newdata = data.frame(price=40, age=50, 
                                            poor_prop=c(5,25,35,50,65), 
                                            airport= 'YES'), type ="response")



# Now, we create a dataframe with the prediction results for different combinations of
# "price" and "poor_prop":

predicciones <- data.frame(
        price = c(rep(20, times=5), rep(30, times=5), rep(40, times=5)),
        
        fitted_values = c(a,b,c),
        
        poor_prop = c(5,25,35,50,65)
        
)

# Let's see the dataframe:
predicciones

# attach of the dataframe:
attach(predicciones)

# Finally, let's make the plot:
ggplot(data = predicciones, aes(x = poor_prop, y = fitted_values,
                                col = price)) + geom_point() + geom_line() + 
  scale_color_gradient(low="blue", high="red")

I show the dataframe that I created:

price fitted_values poor_prop
20  8.490973e-01    5       
20  1.231930e-01    25      
20  2.171980e-02    35      
20  1.392686e-03    50      
20  8.759648e-05    65      
30  5.602225e-01    5       
30  3.082831e-02    25      
30  5.001293e-03    35      
30  3.156376e-04    50      
30  1.983277e-05    65
40  2.238433e-01    5       
40  7.149899e-03    25      
40  1.136666e-03    35      
40  7.147629e-05    50      
40  4.490112e-06    65  

And the plot I have obtained is the following: enter image description here

However, the correct thing would be for each line to be joined with its respective price, in order to have the three series of probabilities, so I don't understand why all the points are joining itself. If anyone has an idea and gives me a hand I would really appreciate it.

Regards!


Solution

  • You can convert price to a factor:

    ggplot(data = predicciones, 
           aes(x = poor_prop, y = fitted_values, col = factor(price))) + 
      geom_point() + 
      geom_line() + 
      scale_color_manual(values = c("blue", "purple", "red"),
                         name = "price")
    

    enter image description here