firebasegoogle-cloud-firestoreoptimizationdatabase-design

How do I structure data common to all users in Cloud Firestore?


I'm working on an app similar to Goodreads and using Cloud Firestore for the backend. My current DB structure is something like this:

Users (collection)
    uniqueUser1 (document)
        user1Metadata (fields)
        ...
        books (sub-collection)
            book1 (document)
                globalBook1Metadata (fields such as genre, year published, etc)
                ...
                user1Book1Metadata (fields such as rating, comments, date added, etc)
            book2 (document)
                globalBook2Metadata (fields such as genre, year published, etc)
                ...
                user1Book2Metadata (fields such as rating, comments, date added, etc)
    uniqueUser2 (document)
        ...
        books (sub-collection)
            book2 (document)
                globalBook2Metadata (fields such as genre, year published, etc)
                ...
                user2Book2Metadata (fields such as rating, comments, date added, etc)
            book3 (document)
                ...

I have a users collection which contains user documents. Each user document contains some unique user metadata along with a sub-collection of books. Each book in the sub-collection has some global metadata and data that is unique to the user.

This seems horribly inefficient in terms of storage. I am storing the global book metadata every time a user adds a book. In the above example, book2 is stored twice. So my next thought was to add a books collection at the root level like this:

Books (collection)
    book1 (document)
        globalBook1Metadata (fields such as genre, year published, etc)
    book2 (document)
        globalBook2Metadata (fields such as genre, year published, etc)
    book3 (document)

This is much better in terms of storage. I can now just store the book id in the user document and query this collection to get the global book metadata.

But now there is a new problem. Say I want to filter a user's library based on the genre. Well, because genre is only stored in the Books collection, I have to get all the books from the Books collection that match the ids in the user's book sub-collection. That has the potential to be a very large number of reads which is expensive.

So my question comes down to 2 parts:

  1. Is there a way to send a single query that gets all the books from Books that have matching ids to the user's book sub-collection? If so, that would be a great solution.

  2. If not, what is the best way to structure this data so that I am not doing an excessive amount of R/W or using way too much storage?

I am a frontend dev so this is new territory for me. Thanks in advance!


Solution

  • You are on a good way with the second approach, the first one it will still work but is not scalable.

    If you want to keep the current structure you have:

    -Collection of books

    -- document_id1

    -- document_id2

    You can achieve decent result with having array of id's of books in every user (you can add and remove them with ArrayUnion and ArrayRemove) then you can query based on the user selected genre from the filter and the array of saved id books with the "in" operator it will look something like:

    const q = query(collection(db, "books"), where('genre', '==', 'comedy'), where('bookIdField', 'in', user.uidArrayFavBooks));
    

    In this way you will get only needed results with one query.

    If you want to optimize it slightly because "in" operator allows up to 10 items in array you can separate the general id array in user document to a different genres arrays if you know for 100% all genres input, something like:

    comedy: ['id1', 'id2']
    action: ['id1', 'id3', 'id6']
    ...and so on
    

    Note: If genres options are not controlled by you I will not advice you to do it, because is not good practice to rely on predictions of user data input.

    As a workaround you can store object with the id and genre in array and pass it to the query with filter method for selected genre, as example:

    const userFavBooks = [
        { id: 1, genre: "drama" },
        { id: 2, genre: "action" },
        { id: 3, genre: "comedy" },
        { id: 4, genre: "drama" },
      ];
    const filteredBooksArray = userFavBooks.filter((e) => e.genre === "drama").map((e) => e.id) // [1,4]
    
    const q = query(collection(db, "books"), where('bookIdField', 'in', filteredBooksArray));