Tensorflow came out with the XLA compiler which compiles the backend C++ tensorflow targeting LLVM. My understanding about XLA was that it was a step towards supporting generic accelerated devices, so long as there was LLVM -> Device support.
Tensorflow lite was more recently released, replacing Tensorflow Mobile, and appears to be where the work is focused on targeting embedded and mobile devices with an apparent focus on embedded DSP and GPUs as optional processors common in these environments. Tensorflow lite appears to hand off operations to the Android NNAPI (neural network API) and supports a subset of the tensorflow OPs.
So this begs the question: which direction is Google going in to support non CUDA based devices? And are there use cases for XLA beyond what I described?
I work on XLA. The XLA compiler has three backends: for CPU, GPU, and TPU. The CPU and GPU ones are based on LLVM and are open source, and the TPU one is closed source.
I don't know what the plans are for XLA for mobile devices, so I can't comment on that.
A benefit you get by using XLA with your TF model, instead of executing the model directly, is that XLA fuses a lot of ops for you. See this post for instance.