androidiosperformancetensorflow-litetensorrt

Fastest way to run recurrent neural network (inference) on mobile device


What I have: A trained recurrent neural network in Tensorflow.

What I want: A mobile application that can run this network as fast as possible (inference mode only, no training).

I believe there are multiple ways how I can accomplish my goal, but I would like you feedback/corrections and additions because I have never done this before.

  1. Tensorflow Lite. Pro: Straight forward, available on Android and iOS. Contra: Probably not the fastest method, right?
  2. TensorRT. Pro: Very fast + I can write custom C code to make it faster. Contra: Used for Nvidia devices so no easy way to run on Android and iOS, right?
  3. Custom Code + Libraries like openBLAS. Pro: Probably very fast and possibility to link to it on Android on iOS (if I am not mistaken). Contra: Is there much use for recurrent neural networks? Does it really work well on Android + iOS?
  4. Re-implement Everything. I could also rewrite the whole computation in C/C++ which shouldn't be too hard with recurrent neural networks. Pro: Probably the fastest method because I can optimize everything. Contra: Will take a long time and if the network changes I have to update my code as well (although I am willing to do it this way if it really is the fastest). Also, how fast can I make calls to libraries (C/C++) on Android? Am I limited by the Java interfaces?

Some details about the mobile application. The application will take a sound recording of the user, do some processing (like Speech2Text) and output the text. I do not want to find a solution that is "fast enough", but the fastest option because this will happen over very large sound files. So almost every speed improvement counts. Do you have any advice, how I should approach this problem?

Last question: If I try to hire somebody to help me out, should I look for an Android/iOS-, Embedded- or Tensorflow- type of person?


Solution

  • 1. TensorflowLite

    Pro: it uses GPU optimizations on Android; fairly easy to incorporate into Swift/Objective-C app, and very easy into Java/Android (just adding one line in gradle.build); You can transform TF model to CoreML

    Cons: if you use C++ library - you will have some issues adding TFLite as a library to your Android/Java-JNI (there is no native way to build such library without JNI); No GPU support on iOS (community works on MPS integration tho)

    Also here is reference to TFLite speech-to-text demo app, it could be useful.

    2. TensorRT

    It uses TensorRT uses cuDNN which uses CUDA library. There is CUDA for Android, not sure if it supports the whole functionality.

    3. Custom code + Libraries

    I would recommend you to use Android NNet library and CoreML; in case you need to go deeper - you can use Eigen library for linear algebra. However, writing your own custom code is not beneficial in the long term, you would need to support/test/improve it - which is a huge deal, more important than performance.

    Re-implement Everything

    This option is very similar to the previous one, implementing your own RNN(LSTM) should be fine, as soon as you know what you are doing, just use one of the linear algebra libraries (e.g. Eigen).

    The overall recommendation would be to:**