Modeling sub-collections in MongoDB Realm Sync

I'm new to MongoDB as well as to MongoDB Realm Sync. I was following the Realm Sync tutorial and Realm data model docs, but I wanted to learn more so I tweaked the Atlas collection structure as follows.

Projects > Tasks // i.e. tasks is a sub-collection in each project.

What I don't know is how to come up with Realm Sync Schema which can support Atlas sub-collections. The best I came up with is a Schema where Tasks are modelled as an array within the Project. But, I'm worried that this can hit the 16MB (although a lot!) document limit for projects with a lot of the tasks.

{
  "bsonType": "object",
  "properties": {
    "_id": {
      "bsonType": "objectId"
    },
    "_partition": {
      "bsonType": "string"
    },
    "name": {
      "bsonType": "string"
    },
    "tasks": {
      "bsonType": "array",
      "items": {
          "bsonType": "object",
          "title": "Task",
          "properties": {
              "name": {
                "bsonType": "string"
              },
              "status": {
                "bsonType": "string"
              }
          }
      }
    }
  },
  "required": [
    "_id",
    "_partition",
    "name",
  ],
  "title": "Project"
}

Looking forward on how to model sub-collection the right way.

Edit

Here's my client side Realm models.

import Foundation
import RealmSwift

class Project: Object {
    @objc dynamic var _id: String = ObjectId.generate().stringValue
    @objc dynamic var _partition: String = "" // user.id
    @objc dynamic var name: String = ""
    var tasks = RealmSwift.List<Task>()
    override static func primaryKey() -> String? {
        return "_id"
    }
}

class Task: EmbeddedObject {
    @objc dynamic var name: String = ""
    @objc dynamic var status: String = "Pending"
}

As far the CRUD operations are concerned, I only create a new project and read existing projects as follows.

// Read projects
realm.objects(Project.self).forEach { (project) in
   // Access fields     
}
        
// Create a new project
try! realm.write {
    realm.add(project)
}

Solution

Your code looks great and your heading the right direction, so this answer is more explanation and suggestions on modeling than hard code.

First, Realm objects are lazily loaded which means they are only loaded when used. Tens of thousands of objects will have very little impact on a devices memory. So suppose you have 10,000 users and you 'load them all in'

let myTenThousandUsers = realm.objects(UserClass.self)

meh, no big deal. However, doing this

let someFilteredUsers = myTenThousandUsers.filter { $0.blah == "blah" }

will (could) create a problem - if that returns 10,000 users they are all loaded into memory possibly overwhelming the device. That's a Swift function and 'converting' Realms lazy data using Swift should generally be avoided (use case dependent)

The observation of this code using Swift .forEach

realm.objects(Project.self).forEach { (project) in
   // Access fields     
}

could cause issues depending on what's being done with those project objects - using them as a tableView dataSource could be trouble if there are a lot of them.

Second thing is the question about the 16Mb limit per document. For clarity an Atlas document is this

{
   field1: value1,
   field2: value2,
   field3: value3,
   ...
   fieldN: valueN
}

where value can be any of the BSON data types such as other documents, arrays, and arrays of documents.

In your structure, the var tasks = RealmSwift.List<Task>() where Task is an embedded object. While conceptually embedded objects are objects, I believe they count toward a single document limit because they are embedded (correct me if I am wrong); as the number of them grows, the size of the enclosing document grows - keeping in mind that 16Mb of text is an ENORMOUS of text so that would/could equate to millions of tasks per project.

The simple solution is to not embed them and have them stand on their own.

class Task: Object {
    @objc dynamic var _id: String = ObjectId.generate().stringValue
    @objc dynamic var _partition: String = "" 
    @objc dynamic var name: String = ""
    @objc dynamic var status: String = "Pending"
    override static func primaryKey() -> String? {
        return "_id"
    }
}

Then each one can be 16Mb, and an 'unlimited number' can be associated with a single project. One advantage of embedded objects is a type of cascade delete where when the parent object is deleted, the child objects are as well, but with a 1-many relationship from Project to Tasks - deleting a bunch of tasks belonging to a parent is easy.

Oh - another case for not using embedded objects - especially for this use case - is they cannot have indexed properties. Indexing can greatly speed up some queries.