mongodbpymongo

MongoClient to connect to multiple hosts to handle failover?


I have a pretty simple PyMongo configuration that connects to two hosts:

from pymongo import MongoClient

host = os.getenv('MONGODB_HOST', '127.0.0.1')
port = int(os.getenv('MONGODB_PORT', 27017))
hosts = []
for h in host.split(','):
    hosts.append('{}:{}'.format(h, port))
cls.client = MongoClient(host=hosts, replicaset='replicatOne')

MONGODB_HOST is composed of a list of two ips, such as "primary_mongo,secondary_mongo"

They are configured as a replica set.

The issue I notice is that, in this current configuration, if secondary_mongo goes down, my whole code stops working.

I believe that giving a list of hosts to MongoClient would tell it "use the one that works, starting with the first", but it appears it's not correct.

What is wrong and how can I ensure that MongoClient connects properly to primary_mongo first, and in the event it fails, goes to secondary_mongo ?


Solution

  • In order to have a fully running MongoDB ReplicaSet you must have a PRIMARY member. A PRIMARY can be only elected if the majority of all members is available. 1 out of 2 is not the majority, thus your MongoDB ReplicaSet goes down if one node fails.

    When you connect to your MongoDB then you can connect either to the ReplicaSet, e.g.

    mongo "mongodb://localhost:27037,localhost:27137,localhost:27237/?replicaSet=repSet"
    

    or you connect directly, e.g.

    mongo "mongodb://localhost:27037"
    

    When you connect to a ReplicaSet and the PRIMARY goes down (or when it steps down to SECONDARY for whatever reason) and another member becomes the PRIMARY, then usually the client automatically reconnect to the new PRIMARY. However, connecting to a ReplicaSet requires a PRIMARY member, i.e. the majority of members must be available.

    When you connect directly, then you can also connect to a SECONDARY member and use the MongoDB in read/only mode. However, you don't have any failover/reconnect function if the connected member goes down.

    You could create an ARBITER member on one of your nodes. Then, if the other node goes down, the application is still fully available. Bear in mind, by this setup you can lose only the "second" host but not either of them. In best case you configure the ARBITER on an independent third location.

    From the MongoDB: The Definitive Guide by Shannon Bradshaw, Eoin Brazil, Kristina Chodorow: Part III - Replication, CHAPTER 9 - Setting Up a Replica Set:

    How to Design a Set

    To plan out your set, there are certain replica set concepts that you must be familiar with. The next chapter goes into more detail about these, but the most important is that replica sets are all about majorities: you need a majority of members to elect a primary, a primary can only stay primary so long as it can reach a majority, and a write is safe when it’s been replicated to a majority. This majority is defined to be “more than half of all members in the set,” as shown in Table 9-1.

    Number of members in the set Majority of the set
    1 1
    2 2
    3 2
    4 3
    5 3
    6 4
    7 4

    Note that it doesn’t matter how many members are down or unavailable, as majority is based on the set’s configuration.

    For example, suppose that we have a five-member set and three members go down, as shown in Figure 9-1. There are still two members up. These two members cannot reach a majority of the set (at least three members), so they cannot elect a primary. If one of them were primary, it would step down as soon as it noticed that it could not reach a majority. After a few seconds, your set would consist of two secondaries and three unreachable members.

    enter image description here

    Many users find this frustrating: why can’t the two remaining members elect a primary? The problem is that it’s possible that the other three members didn’t go down, and that it was the network that went down, as shown in Figure 9-2. In this case, the three members on the left will elect a primary, since they can reach a majority of the set (three members out of five).

    In the case of a network partition, we do not want both sides of the partition to elect a primary: otherwise the set would have two primaries. Then both primaries would be writing to the data and the data sets would diverge. Requiring a majority to elect or stay primary is a neat way of avoiding ending up with more than one primary.

    enter image description here

    It is important to configure your set in such a way that you’ll usually be able to have one primary. For example, in the five-member set described above, if members 1, 2, and 3 are in one data center and members 4 and 5 are in another, there should almost always be a majority available in the first data center (it’s more likely to have a network break between data centers than within them).