I am working on MongooseIM server with PostgreSQL as backend. I need all the basic features like:
I can scale MongooseIM with clustering (with Mnesia for storing sessions). But I feel I may face bottle neck from the backend side using PostgreSQL. What is the rough estimate of the number of users that can be handled by PostgreSQL without the must have need for shift on No-sql like Riak. Because I don't want to mess-up the backend from the scalability point of view.
Example - Suppose I need to handle >10 Million users. Can I use PostgreSQL for that. If yes, then do I need special techniques like sharding in Postgresql to continue to work with it or at some point I must have to be shifted to Riak or other Nosql. What do you recommend?
This is a complex matter. First of all, it depends a lot on the set of features you have enabled in MongooseIM, so any load tests done by others might not reflect your particular configuration and environment.
Moreover, to actually be able to monitor all the important parameters in a system reaching tens of thousands of users, you'll probably need client-side performance tracking (delivery acks, roundtrip time) - this is not standardised by XMPP XEPs though, so implementations vary.
This being said, Postgres will probably only allow you to scale into hundreds of thousands of online users. It might require table sharding and user sharding across MongooseIM clusters using XMPP federation to allow for communication between all users. DB nodes in this architecture are associated with MIM clusters, therefore do not share data across cluster boundaries. Setups along these lines are known to work in production.
If you have to handle millions of users, you most likely have to switch from PostgreSQL to a database which can itself be more easily scaled. MongooseIM supports Cassandra, which currently seems to be the best bet. Personally, I'd curious to find out how TiKV (let me know, please, if you try that one) fares in such setups. It pretends to be MySQL on the wire, so should also work with MongooseIM, but operationally behaves similarly to Cassandra: automated replication, cluster scalability, fault tolerance, etc.
Ultimately, though, you'll only know by testing yourself. Instagram shares some interesting stories about Postgres at scale which show the amount of maintenance required. Alas, load tests reaching millions of users are prohibitively expensive due to cloud costs, so such test results are not common.