rbetareg

issue with fitting beta regression


I am trying to model the relationship between my response 'crop coverage [%]' ~ weed coverage [%] + Soil Moisture [%] using R. Since I am dealing with proportions, I chose to do a beta regression. I was told that in order to better fit and visualize the model, using the mean of weed_coverage would be a good idea. However, when I do that, I get following error:

betareg (crop_coverage ~ soil_moisture + weed_coverage_mean, data = df) -> model_a

Error in optim(par = start, fn = loglikfun, gr = gradfun, method = method, : non-finite value supplied by optim

Why do I get this error? And is that really the best way to fit and to visualize this model, since weed_coverage and soil_moisture are both continuous variables? Thanks a lot in advance.

My data:

df <- structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("2021-03-17", 
"2021-04-07", "2021-04-13", "2021-04-27", "2021-05-11", "2021-05-27"
), class = "factor"), weed_coverage = c(0, 0, 0, 1.7, 1, 5, 0, 
0, 0.1, 0.2, 1, 2.8, 2.5, 1, 1, 5, 0, 0, 0.9, 0.7, 0, 1.1, 0.5, 
0.5, 0, 0, 0.5, 4, 0, 0.3, 0.8, 4, 1, 2, 2, 6, 0.2, 5, 0, 0, 
3, 1, 0, 2, 0, 0, 0, 3, 3, 0), soil_moisture = c(36.28, 37.6, 
38.55, 34.38, 34.02, 34.88, 34.92, 38.12, 35.38, 36.92, 27.15, 
24.95, 21.38, 22.95, 27.65, 25.7, 27.02, 32.1, 27.18, 26, 14.97, 
15.25, 17.02, 16.12, 15.32, 14.3, 14.5, 12.45, 13.07, 15.4, 14.9, 
12, 16.85, 17.15, 18.52, 10.68, 13.82, 9.5, 15.32, 10.97, 14.8, 
17.05, 26.75, 14.8, 25.75, 19.18, 18.12, 14.22, 18.95, 24.38), 
    crop_coverage = c(0.38, 0.6, 0.75, 0.5, 0.4, 0.48, 0.74, 
    0.75, 0.27, 0.45, 0.65, 0.3, 0.4, 0.38, 0.45, 0.58, 0.48, 
    0.75, 0.58, 0.4, 0.9999, 0.7, 0.75, 0.7, 0.85, 0.78, 0.7, 
    0.91, 0.2, 0.6, 0.95, 0.85, 0.6, 0.7, 0.75, 0.9, 0.8, 0.85, 
    0.75, 0.96, 0.85, 0.85, 0.75, 0.73, 0.68, 0.7, 0.97, 0.7, 
    0.75, 0.74), weed_coverage_mean = c(1.256, 1.256, 1.256, 
    1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 
    1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 
    1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 
    1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 
    1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 
    1.256, 1.256)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-50L))

Solution

  • You should fit the model with the original data, then use the mean value of the non-focal predictors when predicting values to use in the plot. For example:

    fit

    library(betareg)
    m <- betareg (crop_coverage ~ soil_moisture + weed_coverage, data = df)
    

    construct prediction frame and predict

    pframe <- with(df, data.frame(soil_moisture = seq(min(soil_moisture),
                                                      max(soil_moisture),
                                                      length.out = 50),
                                  weed_coverage = mean(weed_coverage)))
    pframe$crop_coverage <- predict(m, newdata = pframe, type = "response")
    

    plot

    plot(crop_coverage ~ soil_moisture, data = df)
    with(pframe, lines(soil_moisture, crop_coverage))
    

    observed data with predicted line

    You can do fancier stuff if you want like use expand.grid() to generate a prediction frame that computes the predicted values for several different levels of weed coverage (if you're going to do this you may want to move to ggplot2 for plotting).