apache-kafkakey-value-storerequest-response

Request-response with kafka and key-value store


TLDR: Have a request/response pattern. Currently requests done via activemq queue and response is done via memcached key-value store (which is polled by front end). Want to move to kafka for a variety of reasons, wondering if we can re-architect the response path to not use memcached.

I am trying to understand what would be the best practice system design for the following problem.

We have a frontend that generates requests that require heavy processing. The app needs a response in order to advance. Occasionally we need to undo/step back (which gets you to the previous state(s)). There is a cluster of backends that can perform the heavy processing step.

In our current setup the frontend pushes requests into a queue (currently activemq) and the backends processes items from the queue as they can and stores results in a key-value store (memcached), with the key being the UUID of the message from the queue (which is itself a unique session id + non-unique step id). The frontend is polling the store for the UUID of the message. This has the advantage that a front-end might lose connection/etc but as long as session id is preserved we can ping the key-value store for the result we need. We also occasionally need to move back/undo actions and we can walk back the results in the key-value store (since each step has its own UUID and all UUIDs are known).

However, in the future we would like to be able to do the response at least partially through the queue, this way we can have some analytics tools as consumers for both request and response. The "minimal-change" would be to have the response producers push into a queue and have memcached being one of the consumers. But maybe there is a better way. We are also looking at switching from activemq to Kafka as this would give us replayability (but we don't have experience with kafka).

Looking at Kafka, it looks like to get a particular message you would need to scan the entire partition, is there an easier way to retrieve a particular message? Do we generate a topic for every sequence of interactions? If we want to replay but don't know the offsets what is our recourse (other than looking through very many messages)? Our load is pretty small (~ 1 mil messages/day) so I guess anything works but what would be the best practices (the infamous, what if we scale)?


Solution

  • As I understand your use-case, you don't have an effective way of delivering the responses to the app via push, which is why you make the responses available for the app to pull by id (key). You can switch out the various components, e.g. ActiveMQ for Kafka, memcached for any other KV store, but ultimately if your constraints are such that the app needs to pull the results from the server, you will always have to consume the responses form the async transport and make the them available on the server. As an example, if you switch to Kafka, you could implement your consumer as a [global] KTable in Kafka Streams and serve the responses that way, but that's still just a KV store with extra steps. There is no good way of getting a particular message/offset directly from a Kafka topic, that's not really how it's intended to be used.

    Without knowing a lot more details, it seems sensible to keep the asynchronous transport component (ActiveMQ, Kakfa, whatever) separate from the serving component, in order go be able to scale or swap them out separately. For example, if you scale to a size that no longer fits in memory of a single memcached instance, you have a straightforward migration path to any number of distributed KV stores, like Redis, Couchbase, DynamoDB, etc.