I want to build a training pipeline using TFX, and eventually to reuse my data transformations to make inference requests to TensorFlow-Serving, which TFX is supposedly able to do. The TFX examples I found all seem to build a batch training pipeline and eventually push the model in TensorFlow-Serving, but they don't address the inference part, which must be a streaming pipeline for latency reasons. I could probably write my own tool to make the request, but it seems a waste not to reuse my Transform component for the inference part.
I have run locally the examples installed in dags by the TFX examples setup script. The airflow UI makes it clear that those are batch pipelines.
TFX allows you to define your transform logic inside the training pipeline, and save the logic as part of the resulting model graph, so that your saved model will include both the transformations and the usual model, and tf serving will be able to accept request in pre-transform data format, and do both the proper transformations and model inference without any additional work. Therefore by design TFX is not involved in inference.