In gossip-based protocols, how do we guarantee that all nodes get infected by the message?
If we selected a random number of nodes and send a message to these nodes, and these nodes did the same, there is a probability that some node will not receive the message.
Although I couldn't calculate it, it seems small. However, if the system is running for a long time, at some point one nodes will be unlucky and will be leftover.
It's a bit hard to answer, due to two reasons:
There isn't really a gossip-based protocol. At most, there are families of gossip-based algorithms.
The algorithms actually guarantee infection only under specific assumptions. E.g., if, as you put it, as "the system is running for a long time" any given link fails permanently under some exponential process (a very likely scenario), then with probability 1 some node will be completely isolated, and no protocol can overcome that.
However, IIUC, you're asking about a protocol with the following assumptions:
Under these conditions, the problem you raised will have probability 0.
You can think about the infection as a Markov Chain where the system is at state i if i nodes are infected. Suppose some change originated at some s ∈ V, and so the system starts at state i.
By property 1., there is a link from the i infected nodes to one of the n - i others.
By property 2., the probability of selecting this link is at least 1 / n. This is because the node whose link happens to cross the cut, has at most n neighbors, but at least one neighbor across the cut. Even if its selection is entirely stateless and uninformed, that is the chance that it will choose this neighbor.
Therefore, the probability that this will not happen for j steps is (1 - d/n)j. Using the Union Bound, the probability that this will happen for any state i is at most n (1 - 1/n)j. Take j = n2, and this becomes n e- n; take j = n3, and this becomes n e- n2. Etc.
(Of course, gossip algorithm infection happens much sooner; this is an upper bound for the worst-possible conditions.)
So, if the system runs long enough, the probability that some node does not become infected, decreases to 0 (very quickly). For Anti-Entropy Gossip Protocols, this is enough. For some other protocols, as you suspected, there is a chance that some node will be missed for some update.