I have the function GetExtrema
below (thanks to @ZheyuanLi) which I use to identify the x (age in years) and y (BMI in kg/m2) at the minima and maxima of a curve.
Because there might be local (rather than global) minima/maxima, how can I edit the GetExtrema
function to require that the first derivative of the curve is positive for 0.25 years (i.e., 3 months) before and negative for 0.25 years after a maximum for it to be considered the true maximum (and same for the minimum).
# Load packages
library(tidyverse)
library(ggeffects)
library(mgcv)
# data
dat <- read.csv((
"https://raw.githubusercontent.com/aelhak/NCRM2023/main/bmi_long.csv")) %>%
select(age, bmi)
# model
model <- gam(
bmi ~ s(sqrt(age), bs = 'ps'),
method = "REML", data = dat
)
# predictions
pred <- predict_response(model, terms = "age [all]") %>%
as.data.frame() %>% select(x, predicted) %>%
rename(y = predicted)
with(pred, plot(x, y, type = "l"))
# Identify x and y at curve min and max
GetExtrema <- function (x, y) {
turn <- diff.default(sign(diff.default(y)))
maxInd <- which(turn == -2) + 1L
minInd <- which(turn == 2) + 1L
list(min.x = x[minInd], min.y = y[minInd],
max.x = x[maxInd], max.y = y[maxInd])
}
GetExtrema(pred$x, pred$y)
First of all, you are not searching for global extrema. Clearly, the global minimum is at the minimum age. You are searching for local extrema with extra conditions.
The gratia package provides a function to get derivatives using the finite-difference approach. I would implement the search for extrema using this function. However, the nature of numerics is that you need to provide tolerances and these tolerances will always depend on the data (here the model predictions).
library(gratia)
GetExtrema2 <- function(model, x, tolx, tolmax, tolmin, eps = 1e-07) {
x <- seq(min(x), max(x), by = tolx)
#dy/dx
d1 <- derivatives(model,
data = data.frame(age = x),
order = 1, type = "central", eps = eps)
#d^2y/dx^2
d2 <- derivatives(model,
data = data.frame(age = x),
order = 2, type = "central", eps = eps)
d2chk <- round(d2$.derivative, -log10(tolmax)) < 0
xmax <- (d1$`sqrt(age)`^2)[abs(d1$.derivative) < tolmax & d2chk]
d2chk <- round(d2$.derivative, -log10(tolmin)) > 0
xmin <- (d1$`sqrt(age)`^2)[abs(d1$.derivative) < tolmin & d2chk]
ymax <- unname(predict(model, newdata = data.frame(age = xmax)))
ymin <- unname(predict(model, newdata = data.frame(age = xmin)))
plot(x, predict(model, newdata = data.frame(age = x)), type = "l")
points(xmin, ymin, col = "blue", pch = 16)
points(xmax, ymax, col = "dark red", pch = 16)
list(xmin = xmin, ymin = ymin, xmax = xmax, ymax = ymax)
}
print(ex <- GetExtrema2(model, x = dat$age, tolx = 1e-2, tolmax = 1e-2, tolmin = 1e-3))
#$xmin
#[1] 4.626306 4.636306
#
#$ymin
#[1] 15.69887 15.69887
#
#$xmax
#[1] 0.8263063
#
#$ymax
#[1] 17.21079
As you can see, with the provided tolerances, the function finds one maximum point and two minimum points (with almost identical y values). You could probably adjust the tolerances to find a unique minimum or just round the results and use unique
. Your condition can be implemented simply by looking at the distance between remaining extrema. However, if you get "too local" extrema, you might simply want to reduce the dimension of the basis for the smooth (see the k
parameter of s
).