pythongoogle-app-enginegoogle-cloud-datastoregqlgquery

Minimize subqueries with IN queries on AppEngine (python)


Is there any clever way to avoid making a costly query with an IN clause in cases like the following one?

I'm using Google App Engine to build a Facebook application and at some point I (obviously) need to query the datastore to get all the entities that belong to any of the facebook friends of the given user.

Suppose I have a couple of entities modeled as such:

class Thing(db.Model):
    owner = db.ReferenceProperty(reference_class=User, required=True)
    owner_id = db.StringProperty(required=True)
    ...

and

class User(db.Model):
    id = db.StringProperty(required=True)
    ...

At some point I query Facebook to get the list of friends of a given user and I need to perform the following query

# get all Thing instances that belong to friends
query = Thing.all()
query.filter('owner_id IN', friend_ids)

If I did that, AppEngine would perform a subquery for each id in friend_ids, probably exceeding the maximum number of subqueries any query can spawn (30).

Is there any better way to do this (i.e. minimizing the number of queries)? I understand that there are no relations and joins using the datastore but, in particular, I would consider adding new fields to the User or Thing class if it helps in making things easier.


Solution

  • I don't think there's an elegant solution, but you could try this:

    On the User model, use Facebook ID as the key name, and store each user's list of things in a ListProperty.

    class Thing(db.Model):
      ...
    
    class User(db.Model):
      things = db.ListProperty(db.Key)
      ...
    

    Entity creation would go like this:

    user = User.get_or_insert(my_facebook_id)
    
    thing = Thing()
    thing.put()
    
    user.things.append(thing.key())
    user.put()
    

    Retrieval takes 2 queries:

    friends = User.get_by_key_name(friend_ids)
    thing_keys = []
    
    for friend in friends:
      thing_keys.extend(friend.things)
    
    things = db.get(thing_keys)