python-3.xmongodbpymongoodmumongo

umongo, pymongo, python 3, how do i load data from reference field(s)


I'm trying to understand how and why it's so hard to load my referenced data, in unmongo/pymongo

@instance.register
class MyEntity(Document):
    account = fields.ReferenceField('Account', required=True)
    date = fields.DateTimeField(
        default=lambda: datetime.utcnow(),
        allow_none=False
    )
    positions = fields.ListField(fields.ReferenceField('Position'))
    targets = fields.ListField(fields.ReferenceField('Target'))

    class Meta:
        collection = db.myentity

when i retrieve this with:

    def find_all(self):
        items = self._repo.find_all(
            {
                'user_id': self._user_id
            }
        )
        return items

and then dump it like so:

    from bson.json_util import dumps

    all_items = []
    for item in all_items:
        all_items.append(item.dump())


    return dumps(all_items)

i get the following JSON object:

[
  {
    "account": "5e990db75f22b6b45d3ce814",
    "positions": [
      "5e9a594373e07613b358bdbb",
      "5e9a594373e07613b358bdbe",
      "5e9a594373e07613b358bdc1"
    ],
    "date": "2020-04-18T01:34:59.919000+00:00",
    "id": "5e9a594373e07613b358bdcb",
    "targets": [
      "5e9a594373e07613b358bdc4",
      "5e9a594373e07613b358bdc7",
      "5e9a594373e07613b358bdca"
    ]
  }
]

and without dump

<object Document models.myentity.schema.MyEntity({
'targets':
    <object umongo.data_objects.List([
        <object umongo.frameworks.pymongo.PyMongoReference(
            document=Target,
            pk=ObjectId('5e9a594373e07613b358bdc4')
            )>,
        <object umongo.frameworks.pymongo.PyMongoReference(
            document=Target,
            pk=ObjectId('5e9a594373e07613b358bdc7')
            )>,
        <object umongo.frameworks.pymongo.PyMongoReference(
            document=Target,
            pk=ObjectId('5e9a594373e07613b358bdca'))>]
            )>,
            'id': ObjectId('5e9a594373e07613b358bdcb'),
'positions':
    <object umongo.data_objects.List([
        <object umongo.frameworks.pymongo.PyMongoReference(
            document=Position,
            pk=ObjectId('5e9a594373e07613b358bdbb')
        )>,
        <object umongo.frameworks.pymongo.PyMongoReference(
            document=Position,
            pk=ObjectId('5e9a594373e07613b358bdbe'))>,
        <object umongo.frameworks.pymongo.PyMongoReference(
            document=Position,
            pk=ObjectId('5e9a594373e07613b358bdc1'))>])>,
'date': datetime.datetime(2020, 4, 18, 1, 34, 59, 919000),
'account': <object umongo.frameworks.pymongo.PyMongoReference(document=Account, pk=ObjectId('5e990db75f22b6b45d3ce814'))>
})>
  1. I'm really struggling on how to dereference this. I'd like, recursively that all loaded fields, if i specify them it in umongo schema, are dereferenced. Is this not in the umongo API?

i.e. what if there's a reference field in 'target' as well? I understand this can be expensive on the DB, but is there some way to specify this on the schema definition itself? i.e. in meta class, that i always want the full, dereferenced object for a particular field?

  1. the fact that i'm finding very little documentation / commentary on this, that it's not even mentioned in the umongo docs, and some solutions for other ODMs i've found (like mongoengine) are painfully writing out recursive, manual functions per field / per query. This suggests to me there's a reason this is not a popular question. Might be an anti-pattern? if so, why?

I'm not that new to mongo, but new to python / mongo. I feel like I'm missing something fundamental here.


EDIT: so right after posting, i did find this issue:

https://github.com/Scille/umongo/issues/42

which provides a way forward

is this still the best approach? Still trying to understand why this is treated like an edge case.


EDIT 2: progress

class MyEntity(Document):
    account = fields.ReferenceField('Account', required=True, dump=lambda: 'fetch_account')
    date = fields.DateTimeField(
        default=lambda: datetime.utcnow(),
        allow_none=False
    )
    #trade = fields.DictField()
    positions = fields.ListField(fields.ReferenceField('Position'))
    targets = fields.ListField(fields.ReferenceField('Target'))

    class Meta:
        collection = db.trade

    @property
    def fetch_account(self):
        return self.account.fetch()

so with the newly defined property decorator, i can do:

    items = MyEntityService().find_all()
    allItems = []
    for item in allItems:
        account = item.fetch_account
        log(account.dump())
        allItems.append(item.dump())

When I dump account, all is good. But I don't want to explicitly/manually have to do this. It still means I have to recursively unpack and then repack each referenced doc, and any child references, each time I make a query. It also means the schema SOT is no longer contained just in the umongo class, i.e. if a field changes, I'll have to refactor every query that uses that field.

I'm still looking for a way to decorate/flag this on the schema itself. e.g.

    account = fields.ReferenceField('Account', required=True, dump=lambda: 'fetch_account')

dump=lambda: 'fetch_account' i just made up, it doesn't do anything, but that's more or less the pattern i'm going for, not sure if this is possible (or even smart: other direction, pointers to why i'm totally wrong in my approach are welcome) ....


EDIT 3: so this is where i've landed:

    @property
    def fetch_account(self):
        return self.account.fetch().dump()

    @property
    def fetch_targets(self):
        targets_list = []
        for target in self.targets:
            doc = target.fetch().dump()
            targets_list.append(doc)
        return targets_list

    @property
    def fetch_positions(self):
        positions_list = []
        for position in self.positions:
            doc = position.fetch().dump()
            positions_list.append(doc)
        return positions_list

and then to access:

    allItems = []
    for item in items:
        account = item.fetch_account
        positions = item.fetch_positions
        targets = item.fetch_targets

        item = item.dump()
        item['account'] = account
        item['positions'] = positions
        item['targets'] = targets
        # del item['targets']
        allTrades.append(item)

I could clean it up/abstract it some, but i don't see how i could really reduce the general verbosity at at this point. It does seem to be give me the result i'm looking for though:

[
  {
    "date": "2020-04-18T01:34:59.919000+00:00",
    "targets": [
      {
        "con_id": 331641614,
        "value": 106,
        "date": "2020-04-18T01:34:59.834000+00:00",
        "account": "5e990db75f22b6b45d3ce814",
        "id": "5e9a594373e07613b358bdc4"
      },
      {
        "con_id": 303019419,
        "value": 0,
        "date": "2020-04-18T01:34:59.867000+00:00",
        "account": "5e990db75f22b6b45d3ce814",
        "id": "5e9a594373e07613b358bdc7"
      },
      {
        "con_id": 15547841,
        "value": 9,
        "date": "2020-04-18T01:34:59.912000+00:00",
        "account": "5e990db75f22b6b45d3ce814",
        "id": "5e9a594373e07613b358bdca"
      }
    ],
    "account": {
      "user_name": "hello",
      "account_type": "LIVE",
      "id": "5e990db75f22b6b45d3ce814",
      "user_id": "U3621607"
    },
    "positions": [
      {
        "con_id": 331641614,
        "value": 104,
        "date": "2020-04-18T01:34:59.728000+00:00",
        "account": "5e990db75f22b6b45d3ce814",
        "id": "5e9a594373e07613b358bdbb"
      },
      {
        "con_id": 303019419,
        "value": 0,
        "date": "2020-04-18T01:34:59.764000+00:00",
        "account": "5e990db75f22b6b45d3ce814",
        "id": "5e9a594373e07613b358bdbe"
      },
      {
        "con_id": 15547841,
        "value": 8,
        "date": "2020-04-18T01:34:59.797000+00:00",
        "account": "5e990db75f22b6b45d3ce814",
        "id": "5e9a594373e07613b358bdc1"
      }
    ],
    "id": "5e9a594373e07613b35
8bdcb"
  }
]

Solution

  • It seems like this is a design choice in umongo.

    In Mongoid for example (the Ruby ODM for MongoDB), when an object is referenced it is fetched from the database automatically through associations as needed.

    As an aside, in an ODM the features of "define a field structure" and "seamlessly access data through application objects" are quite separate. For example, my experience with Hibernate in Java suggests it is similar to what you are discovering with umongo - once the data is loaded, it provides a way of accessing the data using application-defined field structure with types etc., but it doesn't really help with loading the data from application domain transparently.