pythonmessagingcentralized

how to handle online users connected to different servers in a centralized messaging application?


I'm building a centralized messaging program with Python similar to the old msn messenger, or whatsapp. Let's say that now, my server can handle aprox 50.000 online users, and it works as follows:

user1 wants to send a msg to user2, so user1 send the msg to the server, the server maintains a huge list in memory that maps users and their ip address, so if user2 is online the server forwards the msg to user2, if user2 is not online the msg is saved in the server until user2 is online again and asks for new msgs.

Now my problem: lets say the program grows in term of number of users and now i have to handle 200k users, so i need 4 servers. What would be the easiest way to handle the proccess of "finding" what server is user2 connected to, in order to forward the message to him? Maybe a "router Server" that maps all the users online in all servers so the server where user1 is connected, forwards the msg to the serverX where user2 is connected? and if this is the best way, what can i do when a user is offline, goes back online and "asks" for new message to a random server? how can i retrieve all its new msgs?

Maybe another way could be that the server when user1 is connected, broadcast a search to the rest of servers asking if user2 is connected to them?

Thanks in advance guys


Solution

  • I would separate the concerns (and respective technologies/protocols):

    1. messaging protocol with users (non-sequenced sending, sequenced replication of incoming message log)
    2. distributed per-recipient ledger persistence
    3. bridging of #1 and #3

    And solve them individually (but of cause knowing that they are to be plugged together, so performance characteristics and protocols between them should not be off by miles).

    For #1, #3 you should be able to hack something quickly together with asyncio, zeromq, etc.

    For #2 I would try to find some existing middleware capable of scaling as a cluster (Kafka, Ignite, etc.).

    The beauty of this approach is that in the first prototype you can mock #2 with a single centralized DB, so the whole thing will be up and running pretty much instantly, while you will be learning how to get the distributed persistence up, tuned, monitored, etc.

    A great article from LinkedIn engineering, that should get you into the right mindset for cracking the problem at hand — The Log: What every software engineer should know about real-time data's unifying abstraction.