I am plotting a nonlinear generalized additive model (gam) using R's mgcv
package:
library(mgcv)
V <- rep(1, nrow(dt)
fit <- gam(cbind(V, group_number) ~ s(time_elapsed, exposure_group, bs='fs', k=1, m=1) + covs,
data = dt,
family=cox.ph,
weights=dt$outcome,
control=gam.control(trace=TRUE, maxit=500)
)
plot.gam(fit)
I have a patient-level dataset dt
with a column with multiple exposures encoded as an ordered factor (1, 2, 3 and 4) in exposure_group
where 1 is the reference. Moreover, dt
contains a binary outcome
column, time_elapsed
is a column with days till outcome and group_number
is a column defining the strata. Additionaly there are some other columns with covariates summarized as covs
When I simply plot the data using plot.gam()
I am wondering how to interpret the y-axis? The x-axis clearly depicts the time_elapsed
, but nowhere in the documentation it is exactly stated what I am looking at. Does the y-axis represent absolute hazards? Or relative hazard ratios (probably log-transformed)? The default y-axis label simply states s(time_elapsed, exposure_group)
and some digits within that s()
PS: this is not a duplicate of Hazard Ratio Plot from mgcv::gam cox.ph model, since I have multiple exposures and thus multiple lines in my GAM plot. The answer given there, however, is maybe also applicable here?
The plot is a partial effect plot, showing the contribution of the set of smooths to the linear predictor of the model, assuming the effects of all other models terms being set equal to 0. Due to the way the smooths are built and subject to identifiability constraints, they are typically centred around 0. The fs
smooths are a little bit special, as they contain a constant term for each level of the grouping variable (a random intercept) and a random slope for each level of the grouping variable, as well as the wiggly bits, but they still span the overall mean of the response and hence span 0.
The linear predictor in these models is the log-hazard, so the y axis is the contribution of the named covariate to the log-hazard, or, how the log-hazard would change as the named covariate is changed from $x_0$ to $x_1$ (i.e. two points on the x axis) assuming the effects of all other terms are set to equal 0. This is typically called the log-hazard ratio.