I would like to understand how linear.predictors are computed in the output of
pb = glm(formula, family = binomial( link = "probit" ), data)
From my understanding, it should be the product matrix of the observations (N x k) and the estimated coefficients (k x 1), with N = sample size, k = number of variables.
I tried to compute them by hand in two fashions:
rowSums(mapply(`*`,pb$model,pb$coefficients))
and
as.matrix(pb$model)%*%as.matrix(pb$coefficients)
Both gave me the same vector of values that equals pb$linear.predictors
for some observations but not all.
Could you help me understand how it is computed and how I could reproduce it by hand?
In general it's better to use accessor methods, if they're available, then internal components of the model (accessed via $
, or @
for S4 objects); internal components can be poorly documented/surprising and/or subject to change (although the latter is unlikely for a foundational component of R like a glm
object).
model.matrix(pb) %*% coef(pb)
should work to compute the linear predictor for a wide range of model types.
To follow up on @nrennie's answer: the first column of the model frame was probably the response variable, while the first element of the coefficient vector is the intercept. If the response was 1.0, then you would correctly add the intercept term to the model. (The model matrix, as opposed to the model frame, has a first column composed entirely of ones ...)