goapache-kafka

Correct Kafka producer choice for REST endpoint under heavy load


everyone! Thank you for trying to help, I will be brief.

To learn Kafka, I am trying the following:

I have read about Kafka, and watched Confluent's video on YouTube.
I am still not able to make a confident decision between using sync or async Kafka producer.
This is where I need your help. Before I continue, allow me to provide some code:

func SomeGinHandler(c *gin.Context) {
    // assume we extracted data from request's JSON
    // data is stuffed into someValue byte array

    /* what comes is a very sensitive spot, 
       because Publish() may place data on the queue, 
       but still somehow fail; now we have "dirty" data in the queue 
       and I see no way to remove it from queue at this moment
    */

    err = kafkaProducer.Publish(c.Request.Context(), []byte(someKey), someValue))
    if err != nil {
        c.AbortWithStatusJSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
        return
    }

    // the point is that above Publish() call should not block but also be reliable
    // I may not lose message from this HTTP handler
    // reading through Kafka so far, I fear this is not possible 
    // I guess I must make a tradeoff, but what is the correct choice??
    // stick with synchronous producer, with ack = waitALL ??
    // or maybe async producer can somehow work here ?? how ??

    c.JSON(http.StatusAccepted, gin.H{"message" : "request accepted"})
}

As I have described in the code comments, I want Kafka publishing code to immediately return, so REST handler doesn't get blocked.

I will process data in a separate microservice, using Kafka consumer group.

I may not afford message loss, and after reading about async producer, I found out it just "fires and forgets" the message, so message loss is possible.

Apparently, I must make a tradeoff, so what approach do you suggest in the above scenario, taking into consideration that REST endpoint will be under heavy load?

Regarding libraries, I am leaning towards franz-go, but sarama seems ok too. I would like to avoid CGO dependency, but if needed confluent's library will do too. If you have some other library in mind I will take it into consideration as well. The nature of the problem I face is of design issue, so I do not expect library will magically solve it.


Solution

  • I would suggest using the sync producer with "wait for all" acks (which is really the min in-sync replicas configured for the broker) to meet your requirements to guarantee message delivery to Kafka. This will enable you to notify the client with the correct status code and any errors in producing to Kafka.

    Also, make your messages consumed idempotently in case duplicate messages are produced.