pythontensorflowobject-detectionobject-detection-apiefficientnet

Why is training loss oscilating up and down?


I am using the TF2 research object detection API with the pre-trained EfficientDet D3 model from the TF2 model zoo. During training on my own dataset I notice that the total loss is jumping up and down - for example from 0.5 to 2.0 a few steps later, and then back to 0.75:

Tensorboard

So all in all this training does not seem to be very stable. I thought the problem might be the learning rate, but as you can see in the charts above, I set the LR to decay during the training, it goes down to a really small value of 1e-15, so I don't see how this can be the problem (at least in the 2nd half of the training).

Tensorboard smoothed

Also when I smooth the curves in Tensorboard, as in the 2nd image above, one can see the total loss going down, so the direction is correct, even though it's still on quite a high value. I would be interested why I can't achieve better results with my training set, but I guess that is another question. First I would be really interested why the total loss is going up and down so much the whole training. Any ideas?

PS: The pipeline.config file for my training can be found here.


Solution

  • In your config it states that your batch size is 2. This is tiny and will cause a very volatile loss.

    Try increasing your batch size substantially; try a value of 256 or 512. If you are constrained by memory, try increasing it via gradient accumulation.


    Gradient accumulation is the process of synthesising a larger batch by combining the backwards passes from smaller mini-batches. You would run multiple backwards passes before updating the model's parameters.

    Typically, a training loop would like this (I'm using pytorch-like syntax for illustrative purposes):

    for model_inputs, truths in iter_batches():
        predictions = model(model_inputs)
        loss = get_loss(predictions, truths)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
    

    With gradient accumulation, you'll put several batches through and then update the model. This simulates a larger batch size without requiring the memory to actually put a large batch size through all at once:

    accumulations = 10
    
    for i, (model_inputs, truths) in enumerate(iter_batches()):
        predictions = model(model_inputs)
        loss = get_loss(predictions, truths)
        loss.backward()
        if (i - 1) % accumulations == 0:
            optimizer.step()
            optimizer.zero_grad()
    

    Reading