pythonrstatsmodelsrobust

Weighted GLM: R Vs Python


In R, we below code for weighted GLM:

glm(formula, weight)

R Documentation: an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector

In Python, using statsmodel.formula.api:

smf.glm(formula, data, freq_weight)

Python Documentation: 1d array of frequency weights. The default is None. If None is selected or a blank value, then the algorithm will replace with an array of 1’s with length equal to the endog.

Is the "weight" in R same as "freq_weight" in Python? (I am getting different Beta estimates in Python and R. They are close but slightly different)


Solution

  • As far as I remember, R glm weights are var_weights not freq_weights.

    statsmodels GLM has both. In some cases both kinds of weights produce the same results, but not for all family link combinations and standard errors can differ in general.

    This notebook illustrates some of the differences https://www.statsmodels.org/stable/examples/notebooks/generated/glm_weights.html

    var_weights are often used when the outcome variable represents an average of several observations and the variance depends on the number of observations that have been used in the average.

    freq_weights are mainly a short cut if we have several identical observations. For example, if we only have categorical explanatory variables, then freq_weights can be use for the counts of unique observations.