architectureslackverificationemail-verification

how did Slack implement sing in for multiple customers


anyone knows how Slack implement end user login by typing only email. I believe Slack has million of end users on multiple customers and multiple domains. How do they verify users login by only typing email? I don't believe Slack has all customers hosted in one database. So the question is how to search for a user sing in info through thousands of databases containing million of users. What type of architectural approach did they apply? Anyone knows?


Solution

  • There are multiple ways to solve this problem. Let me share with you three of them. Let's suppose that there is a business requirement that the e-mail addresses should be unique.

    Single Step - NOSQL

    Let suppose that there is single step which gathers the username and password from the user. Also let's suppose that the user management related data is stored in a NOSQL database (for example in a Key-Value or Document store).

    We could use the e-mail address to perform a point query by providing both the partition and the row keys.

    point query (This picture was taken from one of my presentations' slide)

    The partition key helps us to locate the appropriate database server instance (more about this in the next section) whereas the row key helps us to locate the individual row.

    Single Step - RDBMS

    Let suppose that there is single step which gathers the username and password from the user. Also let's suppose that the user management related data is stored in a Relation database.

    Relation databases are good for vertical scaling (adding more powerful resources) but suboptimal for horizontal scaling (adding more machines). In order to be able to scale a relation database horizontally you need to shard your data. This means you need to partition the data (hopefully evenly) based on some criteria, like the e-mail hash.

    If you apply the consistent hashing algorithm then the user management data can be partitioned/spread across multiple relational database instances. The basic idea is to have a ring and place your servers on the ring itself and the hashed values as well. You can find out which server hosts the data by start searching from the hash value on the ring and move clock-wise until you bump into a server.

    consistent hashing

    (Source)

    Two Steps - Service, Cache

    Let suppose that there is two steps "wizard" which gathers the username and password from the user. First step requests the username/e-mail address. Second step asks for the password. (For example Google's login works like this).

    Also let's suppose that there is a global database (w/o a cache in front of it) which stores only the e-mail addresses and the associated regions. After the system has gathered the e-mail address from the user it can go to this global database to retrieve in which region does the user reside. Then the server can respond to the client with a redirect response which points to the appropriate region's second page. That regional page asks for the password and performs the login against a regional user management database.