Say, we have input in the form of a collection of images: - (200,56x56,3) where 200 is the number of distinct images, 56x56 are the pixels (length vs breadth) and 3 refer to RGB values
So, x1,x2,x3,x4 etc refer to (number of instances, pixels (length), pixels (breadth) and RGB value?
or are there 1,881,600 inputs (equal to 200x56x56x3)?
The number of inputs in your case is 1*56*56*3=9408
. Imagine that you want to predict a value for a 1
new image of dimension 56*56
, you will have to feed the network with all RGB values (3
) of every pixel.
In practice, feed-forward neural networks, as described in your picture, are not used for image classification. Instead, we are using CNN (Convolutional Neural Network).