tensorflowpytorchmxnet

Static graph is fast. Dynamic graph is slow. Is there any specific benchmark demonstrating this?


I've see some benchmark about tensorflow and pytorch. Tensorflow maybe faster but seems not that faster even sometimes slower.

Is there any benchmark on specifically testing on static graph and dynamic graph demonstrating that static graph is much faster that dynamic graph?


Solution

  • To be more precise, the speed benefit comes from "deferred execution with graph rewriting."

    It's typically associated with explicit graph frameworks (Theano/TF), but with enough engineering you could add it to execution models like numpy/PyTorch which don't have explicit graph. See Bohrium for an example of hacking numpy to do rewriting.

    Note that presence of this feature makes the framework less friendly for prototyping, so if you add this to PyTorch, you'll get the same problems that people complain about in TensorFlow

    1. Deferred execution means exceptions can be triggered much later, not when you entered the problematic line
    2. Rewriting means the errors can now be thrown in nodes you didn't create, which gives uninformative error messages

    As far as performance, here's a toy benchmark in TensorFlow showing 5x speed-up when you turn on graph rewriting.

    I crafted the example to be bottlenecked by memory bandwidth, so it's a no-brainer that graph rewriting (cwise fusion), will give significant speed-boost there. For production LSTM model Google reported 1.8 speed-up when turning on graph optimizations (through XLA)