cassandraapache-kafkaapache-stormapache-flink

difference between exactly-once and at-least-once guarantees


I'm studying distributed systems and referring to this old question: stackoverflow link

I really can't understand the difference between exactly-once, at-least-once and at-most-once guarantees, I read these concepts in Kafka, Flink and Storm and Cassandra also. For instance someone says that Flink is better because has exactly-once guarantees while Storm has only at-least-once.

I understand that exactly-once mode is better for latency but at the same time it's worse for fault tolerance right? How can recover a stream if I haven't duplicates? and then... if this is a real problem, why exactly-once guarantee is considered better than others?

Someone can give me better definitions?


Solution

  • Below definitions are quoted from Akka Documentation

    at-most-once delivery

    means that for each message handed to the mechanism, that message is delivered zero or one times; in more casual terms it means that messages may be lost.

    at-least-once delivery

    means that for each message handed to the mechanism potentially multiple attempts are made at delivering it, such that at least one succeeds; again, in more casual terms this means that messages may be duplicated but not lost.

    exactly-once delivery

    means that for each message handed to the mechanism exactly one delivery is made to the recipient; the message can neither be lost nor duplicated.

    The first one is the cheapest—highest performance, least implementation overhead—because it can be done in a fire-and-forget fashion without keeping state at the sending end or in the transport mechanism. The second one requires retries to counter transport losses, which means keeping state at the sending end and having an acknowledgement mechanism at the receiving end. The third is most expensive—and has consequently worst performance—because in addition to the second it requires state to be kept at the receiving end in order to filter out duplicate deliveries