[SOLVED] Aggregate multiple fields in one rethinkdb query

Aggregate multiple fields in one rethinkdb query

How to reduce / aggregate multiple fields in a table? This does not seem efficient:

r.object(
  'favorite_count', 
  r.db('twitterdb').table('tweets').map(tweet => tweet('favorite_count')).avg().round(),
  'retweet_count',
   r.db('twitterdb').table('tweets').map(tweet => tweet('retweet_count')).avg().round()
  )

(Expected) Result:

{

    "favorite_count": 17 ,
    "retweet_count": 156

}

Solution

I'm not sure if it's possible to make RethinkDB work like you want in one go using its built-ins, but you can implement the avg function yourself really easy:

r.db('twitterdb')
  .table('tweets')
  .fold(
    {_: 0, favorite_count: 0, retweet_count: 0},
    (a, tweet) => ({
      _: a('_').add(1),
      favorite_count: a('favorite_count').add(tweet('favorite_count')),
      retweet_count: a('retweet_count').add(tweet('retweet_count'))
    })
  )
  .do(a => ({
    favorite_count: r.branch(a('_').gt(0), a('favorite_count').div(a('_')).round(), null),
    retweet_count: r.branch(a('_').gt(0), a('retweet_count').div(a('_')).round(), null)
  }))

I have quickly-tested the above over a small set of data, and enabling the query profiling showed at least /2 shard accesses and less time to execute. However I'm not sure about the overall profiler output and I don't think I can interpret its details (I believe that native avg is more optimized, but it looks cheaper that accesing the data at least in two rounds). Additionally, this custom avg function implementation is more 0-elements friendly not throwing an error.