Unlike linear and logistic regression, ANNs cost functions are not convex, and thus are susceptible to local optima. Can anyone provide an intuition as to why this is the case for ANNs and why the hypothesis cannot be modified to produce a convex function?
I found a sufficient explanation here:
https://stats.stackexchange.com/questions/106334/cost-function-of-neural-network-is-non-convex
Basically since weights are permutable across layers there are multiple solutions for any minima that will achieve the same results, and thus the function cannot be convex (or concave either).