After adding Xavier and bias_filler, the loss values starts becoming negative. Why?

After I added xavier initialization to every convolution layer, the loss starts becoming negative. Could someone give any suggestion/reason? I added the following lines to all convolutional layers:

weight_filler {
          type: "xavier"
        }
        bias_filler {
          type: "constant"
          value: 0.1
        }

I0305 14:31:53.356343 11179 solver.cpp:219] Iteration 0 (-4.02766e+28 iter/s, 0.528933s/100 iters), loss = 2.05371
I0305 14:31:53.356374 11179 solver.cpp:238]     Train net output #0: accuracy = 0.11937
I0305 14:31:53.356384 11179 solver.cpp:238]     Train net output #1: loss = 2.05371 (* 1 = 2.05371 loss)
I0305 14:31:53.356395 11179 sgd_solver.cpp:105] Iteration 0, lr = 0.0001
I0305 14:32:28.728870 11179 solver.cpp:219] Iteration 100 (2.82699 iter/s, 35.3733s/100 iters), loss = 0.0270034
I0305 14:32:28.729014 11179 solver.cpp:238]     Train net output #0: accuracy = 1
I0305 14:32:28.729028 11179 solver.cpp:238]     Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:32:28.729034 11179 sgd_solver.cpp:105] Iteration 100, lr = 0.0001
I0305 14:33:03.729997 11179 solver.cpp:219] Iteration 200 (2.85701 iter/s, 35.0017s/100 iters), loss = -8.27284e-09
I0305 14:33:03.730154 11179 solver.cpp:238]     Train net output #0: accuracy = 1
I0305 14:33:03.730167 11179 solver.cpp:238]     Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:33:03.730172 11179 sgd_solver.cpp:105] Iteration 200, lr = 0.0001
I0305 14:33:38.885211 11179 solver.cpp:219] Iteration 300 (2.84449 iter/s, 35.1557s/100 iters), loss = -8.27284e-09
I0305 14:33:38.885368 11179 solver.cpp:238]     Train net output #0: accuracy = 1
I0305 14:33:38.885383 11179 solver.cpp:238]     Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:33:38.885387 11179 sgd_solver.cpp:105] Iteration 300, lr = 0.0001
I0305 14:34:14.174548 11179 solver.cpp:219] Iteration 400 (2.83368 iter/s, 35.2898s/100 iters), loss = -8.27284e-09
I0305 14:34:14.174702 11179 solver.cpp:238]     Train net output #0: accuracy = 1
I0305 14:34:14.174720 11179 solver.cpp:238]     Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:34:14.174724 11179 sgd_solver.cpp:105] Iteration 400, lr = 0.0001
I0305 14:34:49.578112 11179 solver.cpp:219] Iteration 500 (2.82453 iter/s, 35.4041s/100 iters), loss = -8.27284e-09
I0305 14:34:49.578254 11179 solver.cpp:238]     Train net output #0: accuracy = 1
I0305 14:34:49.578269 11179 solver.cpp:238]     Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:34:49.578272 11179 sgd_solver.cpp:105] Iteration 500, lr = 0.0001
I0305 14:35:25.042238 11179 solver.cpp:219] Iteration 600 (2.81971 iter/s, 35.4646s/100 iters), loss = -8.27284e-09
I0305 14:35:25.042421 11179 solver.cpp:238]     Train net output #0: accuracy = 1
I0305 14:35:25.042438 11179 solver.cpp:238]     Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:35:25.042443 11179 sgd_solver.cpp:105] Iteration 600, lr = 0.0001
I0305 14:36:00.540053 11179 solver.cpp:219] Iteration 700 (2.81704 iter/s, 35.4983s/100 iters), loss = -8.27284e-09
I0305 14:36:00.540194 11179 solver.cpp:238]     Train net output #0: accuracy = 1
I0305 14:36:00.540207 11179 solver.cpp:238]     Train net output #1: loss =

My another question is that in some networks, Gaussian is added. Like:

weight_filler {
   type: "gaussian"
   std: 0.005
}
bias_filler {
    type: "constant"
    value: 0.1
}

Why are we adding these parameters to convolutional layer? Is it because we are training the network from the scratch?
How are a specific value assigned to std and/or the bias_filler value?

I really appreciate your help.

Solution

Your loss is -8.27284e-09 which practically speaking is zero and not negative (caffe is using a single precision floating point numbers and not double precision).
What loss layer are you using? "SoftmaxWithLoss"?
bias_filler and wieght_filler parameters are added when we want caffe to randomly initialize the weights of the layer, usually when we start training from scratch. If you start training from existing model (i.e. fine tuning) there is no meaning for these arguments.
std value is computed based on the fan-in and fan-out (i.e., number of in-channels and out channels) in order to keep the statistics of the Blob values roughly zero-mean and unit variance.
You can find an analysis of these parameters in Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (arXiv 2015).