I am working on a binomial logistic regression analysis with one categorical dependent variable, one continuous independent variable, and one indicator variable. I have run the regression, and created plots. I think I have done everything correctly, however, I am put off a little bit by the look of my plots. This is how my plots look (Note, the red dotted line is the indicator variable. It is not included in the code further down.):
And this is how I was thought in school that they were going to look:
Here is a reproducible sample:
sample = data.frame(AWO = sample(0:1,1000, T),
corn = rnorm(1000, 0, 1))
Here are the code I applied:
library(ggplot2)
ggplot(sample, aes(x=corn, y=AWO)) +
geom_point(alpha = .25) +
geom_smooth(method = "glm",
method.args = list(family = "binomial"),
se = FALSE)
I have also performed and plotted the regression manually and get the same results, for those interested:
mwc <- glm(AWO ~ corn, data = sample, family=binomial)
x0 = seq(min(sample$corn), max(sample$corn), length = 1000)
plot(sample$corn, sample$AWO)
pwc = predict(mwc, newdata = data.frame(corn = x0), type = "response")
lines(x0, pwc)
So my question is, have I plotted the regression wrong, or is it simply a case of academia v. practice?
The probability of outcome just doesn't change much over the range of your data, so you only have a small section of the idealised curve. Let's take your example:
library(ggplot2)
set.seed(1)
sample <- data.frame(AWO = sample(0:1,1000, T),
corn = rnorm(1000, 0, 1))
myplot <- ggplot(sample, aes(x=corn, y=AWO)) +
geom_point(alpha = .25) +
geom_smooth(method = "glm",
method.args = list(family = "binomial"),
se = FALSE, fullrange = TRUE)
myplot
#> `geom_smooth()` using formula = 'y ~ x'
But now let's zoom out on the x axis:
myplot + xlim(c(-100, 100))
#> `geom_smooth()` using formula = 'y ~ x'
Created on 2023-05-15 with reprex v2.0.2