machine-learningclassification

Difference between Generative, Discriminating and Parametric, Nonparametric Algorithm/Model


Here in SO I found the following explanation of generative and discriminitive algorithms:

"A generative algorithm models how the data was generated in order to categorize a signal. It asks the question: based on my generation assumptions, which category is most likely to generate this signal?

A discriminative algorithm does not care about how the data was generated, it simply categorizes a given signal."

And here is the definition for parametric and nonparametric algorithms

"Parametric: data are drawn from a probability distribution of specific form up to unknown parameters. Nonparametric: data are drawn from a certain unspecified probability distribution. "

So essentially can we say that generative and parametric algorithms assume underlying model whereas discriminitve and nonparametric algorithms dont assume any model?

thanks.


Solution

  • Say you have inputs X (probably a vector) and output Y (probably univariate). Your goal is to predict Y given X.

    A generative method uses a model of the joint probability p(X,Y) to determine P(Y|X). It is thus possible given a generative model with known parameters to sample jointly from the distribution p(X,Y) to produce new samples of both input X and output Y (note they are distributed according to the assumed, not true, distribution if you do this). Contrast this to discriminative approaches which only have a model of the form p(Y|X). Thus provided with input X they can sample Y; however, they cannot sample new X.

    Both assume a model. However, discriminative approaches assume only a model of how Y depends on X, not on X. Generative approaches model both. Thus given a fixed number of parameters you might argue (and many have) that it's easier to use them to model the thing you care about, p(Y|X), than the distribution of X since you'll always be provided with the X for which you wish to know Y.

    Useful references: this (very short) paper by Tom Minka. This seminal paper by Andrew Ng and Michael Jordan.

    The distinction between parametric and non-parametric models is probably going to be harder to grasp until you have more stats experience. A parametric model has a fixed and finite number of parameters regardless of how many data points are observed. Most probability distributions are parametric: consider a variable z which is the height of people, assumed to be normally distributed. As you observe more people, your estimate for the parameters \mu and \sigma, the mean and standard deviation of z, become more accurate but you still only have two parameters.

    In contrast, the number of parameters in a non-parametric model can grow with the amount of data. Consider an induced distribution over peoples' heights which places a normal distribution over each observed sample, with mean given by the measurement and fixed standard deviation. The marginal distribution over new heights is then a mixture of normal distributions, and the number of mixture components increases with each new data point. This is a non-parametric model of people's height. This specific example is called a kernel density estimator. Popular (but more complicated) non parametric models include Gaussian Processes for regression and Dirichlet Processes.

    A pretty good tutorial on non-parametrics can be found here, which constructs the Chinese Restaurant Process as the limit of a finite mixture model.