pythonmachine-learningdeep-learningpytorchgradient-descent

How to do gradient clipping in pytorch?


What is the correct way to perform gradient clipping in pytorch?

I have an exploding gradients problem.


Solution

  • clip_grad_norm (which is actually deprecated in favor of clip_grad_norm_ following the more consistent syntax of a trailing _ when in-place modification is performed) clips the norm of the overall gradient by concatenating all parameters passed to the function, as can be seen from the documentation:

    The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.

    From your example it looks like that you want clip_grad_value_ instead which has a similar syntax and also modifies the gradients in-place:

    clip_grad_value_(model.parameters(), clip_value)
    

    Another option is to register a backward hook. This takes the current gradient as an input and may return a tensor which will be used in-place of the previous gradient, i.e. modifying it. This hook is called each time after a gradient has been computed, i.e. there's no need for manually clipping once the hook has been registered:

    for p in model.parameters():
        p.register_hook(lambda grad: torch.clamp(grad, -clip_value, clip_value))