[SOLVED] Difference between the cost function of a training sample and the cost function of a mini-batch

Difference between the cost function of a training sample and the cost function of a mini-batch

Lets say that I have a neural network named 'NN' with 500 weights and biases (total parameters=500).

For one training sample: It's introduced through 'NN', it spits out an output (Out1), the output is compared to the training label, and with the backpropagation algorithm there is a small change(positive or negative) in every parameter of 'NN'. The cost function is represented by a vector of dimentions 1x500, with all the small modifications obtained by the backpropagation algorithm.

Lets say mini_batch_size=10

For one mini-batch: each and every one of the 10 training samples provide a cost function of dimensions 1x500.

In order to visualize and explain better, lets say that we create a matrix of 10x500 (called M), where every row is the cost function of every training sample.

Question: For the mini-batch training example, Is the final cost function of the minibatch the result of the average of all the column elements?

PD. In case the question is not clear enough I left some code on what I exactly mean.

for j=1:500
Cost_mini_batch(j)=sum(M(:,j))/10
end

The dimensions of Cost_mini_batch are 1x500.

Solution

"Cost" refers to the loss, i.e. the error between Out1 and the training label.

The cost function is represented by a vector of dimentions 1x500, with all the small modifications obtained by the backpropagation algorithm.

This is called "gradient", not cost function.

Question: For the mini-batch training example, Is the final cost function of the minibatch the result of the average of all the column elements?

Yes, both gradient and cost function for a minibatch is the average of the gradients of each example in the minibatch.