firebasegoogle-cloud-firestore

Do subcollections avoid listener scaling problems that collections have?


The Firestore documentation says to avoid this pattern in using efficient listeners:

As the write rates for your database increase, Firestore splits the data processing across many servers. Firestore's sharding algorithm tries to co-locate data from the same collection or collection group onto the same changelog server. The system tries to maximize the possible write throughput while keeping the number of servers involved in the processing of a query as low as possible.

However, certain patterns might still lead to suboptimal behavior for snapshot listeners. For example, if your app stores most of its data in one large collection, the listener might need to connect to many server to receive all data it needs. This remains true even if you apply a query filter. Connecting to many servers increases the risk of slower responses.

To avoid these slower responses, design your schema and app so that the system can serve listeners without going to many different servers. It might work best to break your data into smaller collections with smaller write rates.

This means you should avoid having one large collection that many listeners connect to because it may lead to slower responses.

A different Firestore documentation gives this example of a how to build a chat app. The example has a structure where messages are stored in a subcollection called messages like this: rooms/roomA/messages/message1. The pattern is collection/document/collection/document.

Question: Does the fact that messages is a subcollection (and not a collection) avoid the scaling problem listed in using efficient listeners?

EDIT 1

For example, userA would create a listener that listens to the path rooms/roomA/messages while another userB would be listening to the path rooms/roomB/messages. If there are a lot of documents in the path rooms/roomA/messages for userA but not many in rooms/roomB/messages for userB, would userB experience slower response times? Or would the two user experiences be independent of each other?

EDIT 2

It looks like if you create a listener that listens to messages as a whole, you will have the scaling problem.

However, in the example above, if you are listening to rooms/roomB/messages you will not have an issue because it is indexed differently.


Solution

  • No, there is no difference between querying a large collection or a large subcollection. They are both going to behave identically in terms of cost and performance. It's the total size of the documents in the collection or subcollection that matters. That's why you're given the advice "It might work best to break your data into smaller collections with smaller write rates."