pythontensorflowmachine-learningkerasgpu

Why is Keras LSTM on CPU three times faster than GPU?


I use this notebook from Kaggle to run LSTM neural network.

I had started training of neural network and I saw that it is too slow. It is almost three times slower than CPU training.

After this I decided to find answer in this question on Stackoverflow and I applied a CuDNNLSTM (which runs only on GPU) instead of LSTM.

Hence, GPU perfomance became only 1 min per epoch and accuracy of model decreased on 3%.

Questions:

1) Does somebody know why GPU works slower than CPU in the classic LSTM layer? I do not understand why this happens.

2) Why when I use CuDNNLSTM instead of LSTM, training become much more faster and the accuracy of the model decrease?

P.S.:

My CPU: Intel Core i7-7700 Processor (8M Cache, up to 4.20 GHz)

My GPU: nVidia GeForce GTX 1050 Ti (4 GB)


Solution

  • Guessing it's just a different, better implementation and, if the implementation is different, you shouldn't expect identical results.

    In general, efficiently implementing an algorithm on a GPU is hard and getting maximum performance requires architecture-specific implementations. Therefore, it wouldn't be surprising if an implementation specific to Nvidia's GPUs had enhanced performance versus a general implementation for GPUs. It also wouldn't be surprising that Nvidia would sink significantly more resources into accelerating their code for their GPUs versus than would a team working on a general CNN implementation.

    The other possibility is that the data type used on the backend has changed from double- to single- or even half-precision float. The smaller data types mean you can crunch more numbers faster at the cost of accuracy. For NN applications this is often acceptable because no individual number needs to be especially accurate for the net to produce acceptable results.