distributed-computinggossip

How to guarantee that all nodes get infected in gossip-based protocols?


In gossip-based protocols, how do we guarantee that all nodes get infected by the message?

If we selected a random number of nodes and send a message to these nodes, and these nodes did the same, there is a probability that some node will not receive the message.

Although I couldn't calculate it, it seems small. However, if the system is running for a long time, at some point one nodes will be unlucky and will be leftover.


Solution

  • It's a bit hard to answer, due to two reasons:

    1. There isn't really a gossip-based protocol. At most, there are families of gossip-based algorithms.

    2. The algorithms actually guarantee infection only under specific assumptions. E.g., if, as you put it, as "the system is running for a long time" any given link fails permanently under some exponential process (a very likely scenario), then with probability 1 some node will be completely isolated, and no protocol can overcome that.

    However, IIUC, you're asking about a protocol with the following assumptions:

    1. For any group V' ⊂ V of nodes, there is an active link u ∈ V' → v ∈ V ∖ V'.

    enter image description here

    1. Each node chooses uniformly d of its neighbors at each step, irrespective of their state, choices made by other nodes, total update state, etc.

    enter image description here

    Under these conditions, the problem you raised will have probability 0.

    You can think about the infection as a Markov Chain where the system is at state i if i nodes are infected. Suppose some change originated at some s ∈ V, and so the system starts at state i.

    Therefore, the probability that this will not happen for j steps is (1 - d/n)j. Using the Union Bound, the probability that this will happen for any state i is at most n (1 - 1/n)j. Take j = n2, and this becomes n e- n; take j = n3, and this becomes n e- n2. Etc.

    (Of course, gossip algorithm infection happens much sooner; this is an upper bound for the worst-possible conditions.)

    So, if the system runs long enough, the probability that some node does not become infected, decreases to 0 (very quickly). For Anti-Entropy Gossip Protocols, this is enough. For some other protocols, as you suspected, there is a chance that some node will be missed for some update.