I understand all the computational steps of training a neural network with gradient descent using forwardprop and backprop, but I'm trying to wrap my head around why they work so much better than logistic regression.
For now all I can think of is:
A) the neural network can learn it's own parameters
B) there are many more weights than simple logistic regression thus allowing for more complex hypotheses
Can someone explain why a neural network works so well in general? I am a relative beginner.
Neural Networks can have a large number of free parameters (the weights and biases between interconnected units) and this gives them the flexibility to fit highly complex data (when trained correctly) that other models are too simple to fit. This model complexity brings with it the problems of training such a complex network and ensuring the resultant model generalises to the examples it’s trained on (typically neural networks require large volumes of training data, that other models don't).
Classically logistic regression has been limited to binary classification using a linear classifier (although multi-class classification can easily be achieved with one-vs-all, one-vs-one approaches etc. and there are kernalised variants of logistic regression that allow for non-linear classification tasks). In general therefore, logistic regression is typically applied to more simple, linearly-separable classification tasks, where small amounts of training data are available.
Models such as logistic regression and linear regression can be thought of as simple multi-layer perceptrons (check out this site for one explanation of how).
To conclude, it’s the model complexity that allows neural nets to solve more complex classification tasks, and to have a broader application (particularly when applied to raw data such as image pixel intensities etc.), but their complexity means that large volumes of training data are required and training them can be a difficult task.