I am trying to model the relationship between my response 'crop coverage [%]' ~ weed coverage [%] + Soil Moisture [%] using R. Since I am dealing with proportions, I chose to do a beta regression. I was told that in order to better fit and visualize the model, using the mean of weed_coverage would be a good idea. However, when I do that, I get following error:
betareg (crop_coverage ~ soil_moisture + weed_coverage_mean, data = df) -> model_a
Error in optim(par = start, fn = loglikfun, gr = gradfun, method = method, : non-finite value supplied by optim
Why do I get this error? And is that really the best way to fit and to visualize this model, since weed_coverage and soil_moisture are both continuous variables? Thanks a lot in advance.
My data:
df <- structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("2021-03-17",
"2021-04-07", "2021-04-13", "2021-04-27", "2021-05-11", "2021-05-27"
), class = "factor"), weed_coverage = c(0, 0, 0, 1.7, 1, 5, 0,
0, 0.1, 0.2, 1, 2.8, 2.5, 1, 1, 5, 0, 0, 0.9, 0.7, 0, 1.1, 0.5,
0.5, 0, 0, 0.5, 4, 0, 0.3, 0.8, 4, 1, 2, 2, 6, 0.2, 5, 0, 0,
3, 1, 0, 2, 0, 0, 0, 3, 3, 0), soil_moisture = c(36.28, 37.6,
38.55, 34.38, 34.02, 34.88, 34.92, 38.12, 35.38, 36.92, 27.15,
24.95, 21.38, 22.95, 27.65, 25.7, 27.02, 32.1, 27.18, 26, 14.97,
15.25, 17.02, 16.12, 15.32, 14.3, 14.5, 12.45, 13.07, 15.4, 14.9,
12, 16.85, 17.15, 18.52, 10.68, 13.82, 9.5, 15.32, 10.97, 14.8,
17.05, 26.75, 14.8, 25.75, 19.18, 18.12, 14.22, 18.95, 24.38),
crop_coverage = c(0.38, 0.6, 0.75, 0.5, 0.4, 0.48, 0.74,
0.75, 0.27, 0.45, 0.65, 0.3, 0.4, 0.38, 0.45, 0.58, 0.48,
0.75, 0.58, 0.4, 0.9999, 0.7, 0.75, 0.7, 0.85, 0.78, 0.7,
0.91, 0.2, 0.6, 0.95, 0.85, 0.6, 0.7, 0.75, 0.9, 0.8, 0.85,
0.75, 0.96, 0.85, 0.85, 0.75, 0.73, 0.68, 0.7, 0.97, 0.7,
0.75, 0.74), weed_coverage_mean = c(1.256, 1.256, 1.256,
1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256,
1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256,
1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256,
1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256,
1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256,
1.256, 1.256)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-50L))
You should fit the model with the original data, then use the mean value of the non-focal predictors when predicting values to use in the plot. For example:
library(betareg)
m <- betareg (crop_coverage ~ soil_moisture + weed_coverage, data = df)
pframe <- with(df, data.frame(soil_moisture = seq(min(soil_moisture),
max(soil_moisture),
length.out = 50),
weed_coverage = mean(weed_coverage)))
pframe$crop_coverage <- predict(m, newdata = pframe, type = "response")
plot(crop_coverage ~ soil_moisture, data = df)
with(pframe, lines(soil_moisture, crop_coverage))
You can do fancier stuff if you want like use expand.grid()
to generate a prediction frame that computes the predicted values for several different levels of weed coverage (if you're going to do this you may want to move to ggplot2
for plotting).