performancegoogle-searchshardinghorizontal-scaling

How does google process 600K documents in .33 seconds?


enter image description here

Regardless of fast their CPUs are,it seems impossible to process that many documents in .33 seconds.

So I believe that it comes down to horizontal scaling. As a guess, how many servers were involved with this query that process 600k documents in under a second?


Solution

  • Google doesn't process that many documents that quickly. Google pre-processes the documents well before you do your search. Google maintains a "search index" that is used to produce the list of search results.

    You can think of a search index like the index in a paper book. For each word, it says what pages on the internet use it. For a query, it looks up each of the words in your query in the search index and creates a list of results from that.

    For reference: What Is A Search Index And How Does It Work? - AddSearch

    Google also has a lot of computers and does a ton of horizontal scaling. It has horizontal scaling for each of the stages of building the search index and displaying search results:

    But there is no amount of horizontal scaling that would allow search engines to process documents in real time based on your search query.