statisticsmachine-learninggaussianglmdiscrete-space

Using Gaussian family distribution to predict discrete quantities in GLM


Is it OK(legitimate approach) to use Generalized Linear Model with Gaussian family distribution to predict discrete quantities by for example rounding the output of Gaussian GLM to the nearest integer?


Solution

  • You can do this but it may not be the best thing to do. It really depends on the nature of the data that you are trying to model. It may well be that poission regression is better suited to your needs.

    http://en.wikipedia.org/wiki/Poisson_regression

    However, there is nothing whatsoever to stop you from actually fitting a linear model to integer valued data but you may have problems when making inference about your data using the model. If you are simply trying to provide a model from which to predict future observations, it may well work nicely even if not theoretically valid.

    Clearly given the nature of the model, you may end up predicting utterly ridiculous results - for example, your reponse variable may only make sense over a limited range (say positive integers) but your model could allow prediction of arbitrarily large values (positive AND negative). Model checking steps like residual checking (normality and correlation) may not give the type of results you would normally see when modelling continuous normally distributed responses.

    Overall, I would say that depending on your data, your approach COULD generate a useful predictive model but in general you should proceed with caution.

    Read this question and some of the answers to it - it discusses similar themes https://stats.stackexchange.com/questions/3024/why-is-poisson-regression-used-for-count-data

    To reach a wider audience you might consider posting this question at http://stats.stackexchange.com