Nowadays in microservice world, i’m seeing alot of design in my workplace that uses kafka messaging when you can achieve similar results using rest api calls between microservices. Technically you can stop using rest api calls altogether and instead use kafka messaging. I really want to know the best practice, pros and cons, when to use api calls between microsevices, when to use kafka messaging.
Lets put a real life example:
I have an inventory service and a vendor service. Everyday vendor service calls the vendor API to get new items and these need to be moved into inventory service. The number of items can be up to 10,000 objects.
For this use case, is it better to :
After getting new data from vendor API, call REST API of inventory service to store the new items.
After getting new data from vendor API, send them as message to a kafka topic, to be consumed by inventory service
Which way would you choose and what is the consideration
Kafka - Publish & Subscribe (just process the pipeline, will notify once the job is done)
REST - Request & Await response (on-demand)
Kafka - Publish once - Subscribe n times (by n components).
REST - Request once, get the response once. Deal over.
Kafka - Data is stored in topic. Seek back & forth (offsets) whenever you want till the topic is retained.
REST - Once the response is over, it is over. Manually employ a database to store the processed data.
Kafka - Split the processing, have intermediate data stored in intermediate topics (for speed and fault-tolerance)
REST - Take the data, process it all at once OR if you wish to break it down, don't forget to take care of your OWN intermediate data stores.
Kafka - The one who makes the request typically is not interested in a response (except the response that if the message is sent)
REST - I am making the request means I typically expect a response (not just a response that you have received the request, but something that is meaningful to me, some computed result for example!)
Is your data streaming?
If the data keeps on coming and you have a pipeline to execute, Kafka is best.
Do you need a request-response model?
If the user requests for something and they wait for a response, then REST is best. Though, you can still use Kafka for this, such as spring-kafka
ReplyingKafkaTemplate
.
Kafka (or any other streaming platform) is typically used for pipelines, i.e. where we have forward flow of data. LinkedIn built Kafka to centrally collect log and metrics into their data warehouse.
Data comes to Kafka and from there it goes through component1, component2 and so on and finally (typically) lands in a database / datalake for long-term storage. Or you can can keep data in Kafka forever like the New York Times.
To get the information on-demand we need a data store (a database) where we can query and get it. In such a case we provide a REST interface which the user can invoke and get the data they want. Confluent, for example, also has a REST Proxy for Kafka to use HTTP to request topic information, however, it is a scan, not an indexed lookup like a database would offer.
Regarding your example,
Everyday vendor service calls the vendor API to get new items and these need to be moved into inventory service
Questions & Answers
Is your vendor API using REST?
Then you need to pull the data and push to Kafka. From there your inventory service (or any other service thereafter) will subscribe to that topic and execute their processing logic.
The advantage here is that you can add any other service which requires vendor data as a consumer to the vendor topic.
Moreover, the vendor data is always there for you even after your inventory service processed it.
If you use REST for this, you need to call the Vendor API for every component that requires vendor data which becomes trivial when used with Kafka
Do you want the inventory to be queried?
Store it in a database after processing through Kafka and provide a REST on top of this. This is needed because Kafka is typically a log, to make the data query-able you would need some database.
From Comments
Compare kafka vs reactive rest api
The above statements don't change. This just means the reply of the request needs to be pulled. You use HTTP for this, and you can stream that feed of intermediate, or final response into Kafka. Async / Reactive REST APIs are still fire and forget; if you don't check the status of the request, you never know it has completed. Plus, there might be a good chance that API you're interacting with is using Kafka - so why not follow?