I am trying to export the underlying data from a LOESS plot (blue line)
I found this post on the subject and was able to get it to export like the post says: Can I export the result from a loess regression out of R?
However, as the last comment from the poster in that post says, I am not getting the results for my LOESS line. Does anyone have any insights on how to get it to export properly?
Thanks!
Code for my export is here:
#loess object
CL111_loess <- loess(dur_cleaned~TS_LightOn, data = CL111)
#get SE
CL111_predict <- predict(CL111_loess, se=T)
CL111_ouput <- data.frame("fitted" = CL111_predict$fit, "SE"=CL111_predict$se.fit)
write.csv(CL111_ouput, "CL111_output.csv")
Data for the original plot is here:
Code for my original plot is here:
{r}
#individual plot
ggplot(data = CL111) + geom_smooth(mapping = aes(x = TS_LightOn, y = dur_cleaned), method = "lm", se = FALSE, colour = "Green") +
labs(x = "TS Light On (Seconsd)", y = "TS Response Time (Seconds)", title = "Layout 1, Condition AO, INS High") +
theme(plot.title = element_text(hjust = 0.5)) +
stat_smooth(mapping = aes(x = TS_LightOn, y = dur_cleaned), method = "loess", se = TRUE) + xlim(0, 400) + ylim (0, 1.0)
#find coefficients for best fit line
lm(CL111_LM$dur_cleaned ~ CL111_LM$TS_LightOn)
You can get this information via ggplot_build()
.
If your plot is saved as gg1
, run ggplot_build(gg1)
; then you have to examine the data
object (which is a list of data for different layers) and try to figure out which layer you need (in this case, I looked for which data layer included a colour
column that matched the smooth line ...
bb <- ggplot_build(gg1)
## extract the right component, just the x/y coordinates
out <- bb$data[[2]][,c("x","y")]
## check
plot(y~x, data = out)
You can do whatever you want with this output now (write.csv()
, save()
, saveRDS()
...)
I agree that there is something weird/that I don't understand about the way that ggplot2
is setting up the loess fit. You do have to do predict()
with the right newdata
(e.g. a data frame with a single column TS_LightOn
that ranges from 0 to 400) - otherwise you get predictions of the points in your data set, which may not be properly spaced/in the right order - but that doesn't resolve the difference for me.