pythonpytorchdeterministicreproducible-research

What is the difference between 'torch.backends.cudnn.deterministic=True' and 'torch.set_deterministic(True)'?


My network includes torch.nn.MaxPool3d which throw a RuntimeError when the cudnn deterministic flag is on according to the PyTorch docs (version 1.7 - https://pytorch.org/docs/stable/generated/torch.set_deterministic.html#torch.set_deterministic).

However, when I inserted the code torch.backends.cudnn.deterministic=True at the beginning of my code, there was no RuntimeError. Why doesn't that code throw a RuntimeError?

I wonder whether that code guarantees the deterministic computation of my training process.


Solution

  • torch.backends.cudnn.deterministic=True only applies to CUDA convolution operations, and nothing else. Therefore, no, it will not guarantee that your training process is deterministic, since you're also using torch.nn.MaxPool3d, whose backward function is nondeterministic for CUDA.

    torch.set_deterministic(), on the other hand, affects all the normally-nondeterministic operations listed here (note that set_deterministic has been renamed to use_deterministic_algorithms in 1.8): https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html?highlight=use_deterministic#torch.use_deterministic_algorithms

    As the documentation states, some of the listed operations don't have a deterministic implementation. So if torch.use_deterministic_algorithms(True) is set, they will throw an error.

    If you need to use nondeterministic operations like torch.nn.MaxPool3d, then, at the moment, there is no way for your training process to be deterministic--unless you write a custom deterministic implementation yourself. Or you could open a GitHub issue requesting a deterministic implementation: https://github.com/pytorch/pytorch/issues

    In addition, you might want to check out this page: https://pytorch.org/docs/stable/notes/randomness.html