I'm running a keys_only query, which fetches 20 results.
result_keys, cursor, more = ActivityIndex.query(cls.followers == key)\
.order(-cls.date_created)\
.fetch_page(num_results,
start_cursor = cursor,
keys_only=True)
I then get the parents of the activityIndex objects:
keys = []
for k in result_keys:
for pair in k.parent().pairs():
keys.append(ndb.Key(pairs=[pair]))
activities_related = ndb.get_multi(keys)
I thought this would be quick, because I was getting a batch of objects by key. However, the query seems call datastore_v3.Next
, which according to the appstats docs is "bad", and takes up a significant portion of the execution time.
Avoiding unnecessary Next calls may speed up your app!
Appstats for the above query (with get_multi call)
Appstats for the above query but without call get_multi (short time for next() to respond).
Why does the datastore_v3.next()
call take so long to execute when calling get_multi()
? Does it depend on the number of results get_multi will return? Some of the objects returned in get_multi have list properties (with max 10 items in a list), would this play a role in the performance?
To avoid this issue, would it be better to change the design and fetch the required entities in tasklets? Any other suggestions?
Edit:
A bit more information about what I am trying to do:
I have an activity stream in my application, which displays all user activity e.g. Rob commented on Picture etc.
To display this information, I thought I'd need the User object and the Picture Object to build up the activity description and information to display. I set all these objects' keys as parents of ActivityIndex
. So from the above query, ndb.get_mult(keys)
would be fetching the Activity, User and Picture objects. The keys list could contain 50 keys or more, so would this possibly be the cause of the long datastore_v3.Next
call.
Are you doing this in the dev_appserver? (I hope not, because its performance is completely unrelated to that of the production environment, and we'd all be wasting our time discussing this.)
Looks like there's something fishy going on. ISTR that the Next call is scheduled asynchronously, and in some cases is not actually needed. That would explain that even though you are not overlapping the calls, the first chart shows Next overlapping the memcache get call.
But still, there shouldn't be that much of a delay. Are you doing extra stuff you're not showing before calling get_multi()?
BTW what you are doing with the pairs looks over-complicated. If there are any parents that have more than one pair, this will produce invalid keys. I think there is only one level of parents and you just want to get those -- if that's so, this should be sufficient:
keys = [k.parent() for k in result_keys]
Finally, see issue 118: http://code.google.com/p/appengine-ndb-experiment/issues/detail?id=118 -- we measured the trick you seem to be attempting and it is only worth it if your hit rate is really high.