rmachine-learningglmnet

How does glmnet standardize variables when weights are present?


glmnet allows the user to input a vector of observation weights through the weights argument. glmnet also standardizes (by default) the predictor variables to have zero mean and unit variance. My question is: when weights is provided, does glmnet standardize the predictors using the weighted mean (and standard deviation) of each column or the unweighted mean (and standard deviation)?


Solution

  • There's a description of glmnet's standardization at Link

    In the post you can see the Fortran-Code-Snippet of glmnet's source that computes the standardization. ("Proof" paragraph, second bullet).

    I'm not familiar with Fortran, but to me it looks very much like it is in fact using the weighted mean and sd.

    Edit: From the glmnet vignette:

    "weights is for the observation weights. Default is 1 for each observation. (Note: glmnet rescales the weights to sum to N, the sample size.)"

    With w in the Fortran code being the rescaled weights, this seems to be consistent with weighted mean standardization.