zeromq

What reliability guarantees (if any) does ZMQ make for PUB/SUB over epgm?


I've got an app sending messages on an epgm PUB socket to one or more epgm SUB sockets. Things mostly work, but if a subscribing application is left up long enough, it will generally end up missing a message or a few messages. (My messages have sequence numbers, so I can tell if any are missing or out of order.) Based on my reading of the ZMQ docs, I would have thought that the "reliable multicast" nature of epgm would prevent this from happening, that after a SUB socket gets one message, it's guaranteed to keep getting them until shutdown or until major network troubles (ie, the connection is maxed out).

Anyway, that's the context, but the question is simply the title: What reliability guarantees (if any) does ZMQ make for PUB/SUB over epgm?


Solution

  • The PGM implementation within ZeroMQ uses an in-memory window for recovery thus is only short lived. If recovery fails due to the window being exhausted: for example publishing faster than it takes a recovery to transition, then the underlying PGM socket will reset and continue at best effort.

    This means at high data rates or significant packet loss the transport will be constantly resetting and you will be dropping messages that cannot be recovered: hence reliable delivery not guaranteed.

    The PGM configuration is targeted at real time broadcast such that slow receivers cannot stall the sender. The protocol does support both paradigms but the latter has not been implemented due to lack of demand.