I want to make a graph of a series of predictions on a logit model in R. The model is as follows:
modelo_logit3 <- glm(formula = Sold ~ price+age+poor_prop+airport, data = datos, family = binomial)
summary(modelo_logit3)
Call:
glm(formula = Sold ~ price + age + poor_prop + airport, family = binomial,
data = datos)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8327 -1.0676 -0.3743 1.0907 1.9014
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.275016 0.743781 5.748 9.05e-09 ***
price -0.148547 0.021930 -6.774 1.26e-11 ***
age 0.009497 0.004592 2.068 0.0386 *
poor_prop -0.184504 0.029633 -6.226 4.78e-10 ***
airportYES 0.871132 0.200409 4.347 1.38e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 697.28 on 505 degrees of freedom
Residual deviance: 610.46 on 501 degrees of freedom
AIC: 620.46
Number of Fisher Scoring iterations: 4
I would like to represent in a scatter plot three series of probability of the variable Sold, based on three different values of price: 20, 30 and 40. The variables age and airport will have a constant value and poor_price is the variable that will vary. In the plot, the Y axis will represent the probabilities and the X axis the poor_price variable. What I have done is the following:
# Let's make the predictions and save them in variables to use them later:
a = predict(modelo_logit3, newdata = data.frame(price=20, age=50,
poor_prop=c(5,25,35,50,65),
airport= 'YES'), type ="response")
b = predict(modelo_logit3, newdata = data.frame(price=30, age=50,
poor_prop=c(5,25,35,50,65),
airport= 'YES'), type ="response")
c = predict(modelo_logit3, newdata = data.frame(price=40, age=50,
poor_prop=c(5,25,35,50,65),
airport= 'YES'), type ="response")
# Now, we create a dataframe with the prediction results for different combinations of
# "price" and "poor_prop":
predicciones <- data.frame(
price = c(rep(20, times=5), rep(30, times=5), rep(40, times=5)),
fitted_values = c(a,b,c),
poor_prop = c(5,25,35,50,65)
)
# Let's see the dataframe:
predicciones
# attach of the dataframe:
attach(predicciones)
# Finally, let's make the plot:
ggplot(data = predicciones, aes(x = poor_prop, y = fitted_values,
col = price)) + geom_point() + geom_line() +
scale_color_gradient(low="blue", high="red")
I show the dataframe that I created:
price fitted_values poor_prop
20 8.490973e-01 5
20 1.231930e-01 25
20 2.171980e-02 35
20 1.392686e-03 50
20 8.759648e-05 65
30 5.602225e-01 5
30 3.082831e-02 25
30 5.001293e-03 35
30 3.156376e-04 50
30 1.983277e-05 65
40 2.238433e-01 5
40 7.149899e-03 25
40 1.136666e-03 35
40 7.147629e-05 50
40 4.490112e-06 65
And the plot I have obtained is the following:
However, the correct thing would be for each line to be joined with its respective price, in order to have the three series of probabilities, so I don't understand why all the points are joining itself. If anyone has an idea and gives me a hand I would really appreciate it.
Regards!
You can convert price
to a factor:
ggplot(data = predicciones,
aes(x = poor_prop, y = fitted_values, col = factor(price))) +
geom_point() +
geom_line() +
scale_color_manual(values = c("blue", "purple", "red"),
name = "price")