jsonnode.jssails.jssails-mongo

How to do a very large query on sails-mongo?


I'm using sails 0.11.2. With the latest sails-mongo adapter. I have a very large database (gigabytes of data) of mainly timestamp and values. And i make queries on it using the blueprint api.

If I query using localhost:1337/datatable?limit=100000000000 the nodejs hangs on 0.12 with a lot of CPU usage, and crashes on v4. It crashes on the toJSON function.

I've finded out that i need to make multiple queries on my API. But I don't how to proceed to make it.

How can i make multiple queries that "don't explode" my server?


Update:

On newer version 0.12.3 with latest waterline and sails-mongo, the queries goes much smoother. The crashes on the cloud was that I didn't had enough RAM to handle sailsjs and mongodb on same T2.micro instance.

I've moved the mongodb server to a M3.Medium instance. And now the server don't crash anymore, but it freezes. I'm using skip limit and it works nicely for sails.js but for mongodb is a great waste of resources!

Mongodb make an internal query using limit = skip + limit. and then moves the cursor to the desired data and returns. When you are making a lot's in pagination you are using lots of internal queries. As the query size will increase.


Solution

  • As this article explains, the way to get around the waste of resources in MongoDB is to avoid using skip and cleverly use _id as part of your query.

    I did not use sails mongo but I did implement the idea above by using mongo driver in nodejs:

    /**
     * Motivation:
     * Wanted to put together some code that used:
     *  - BlueBird (promises)
     *  - MongoDB NodeJS Driver
     *  - and paging that did not rely on skip()
     *
     * References:
     * Based on articles such as:
     * https://scalegrid.io/blog/fast-paging-with-mongodb/
     * and GitHub puclic code searches such as:
     * https://github.com/search?utf8=%E2%9C%93&q=bluebird+MongoClient+_id+find+limit+gt+language%3Ajavascript+&type=Code&ref=searchresults
     * which yielded smaple code hits such as:
     * https://github.com/HabitRPG/habitrpg/blob/28f2e9c356d7053884107d90d04e28dde75fa81b/migrations/api_v3/coupons.js#L71
     */
    
      var Promise = require('bluebird'); // jshint ignore:line
      var _ = require('lodash');
      var MongoClient = require('mongodb').MongoClient;
      var dbHandleForShutDowns;
    
      // option a: great for debugging
      var logger = require('tracer').console();
      // option b: general purpose use
      //var logger = console;
    
      //...
    
        var getPage = function getPage(db, collectionName, query, projection, pageSize, processPage) {
          //console.log('DEBUG', 'filter:', JSON.stringify(query,null,2));
          projection = (projection) ? projection['_id']=true : {'_id':true};
          return db
            .collection(collectionName)
            .find(query)
            .project(projection)
            .sort({'_id':1}).limit(pageSize)
            .toArray() // cursor methods return promises: http://mongodb.github.io/node-mongodb-native/2.1/api/Cursor.html#toArray
            .then(function processPagedResults(documents) {
              if (!documents || documents.length < 1) {
                // stop - no data left to traverse
                return Promise.resolve();
              }
              else {
                if (documents.length < pageSize) {
                  // stop - last page
                  return processPage(documents);
                }
                else {
                  return processPage(documents) // process the results of the current page
                    .then(function getNextPage(){ // then go get the next page
                      var last_id = documents[documents.length-1]['_id'];
                      query['_id'] = {'$gt' : last_id};
                      return getPage(db, collectionName, query, projection, pageSize, processPage);
                    });
                }
              }
            });
        };
    
        //...
    
        return MongoClient
          .connect(params.dbUrl, {
            promiseLibrary: Promise
          })
          .then(function(db) {
            dbHandleForShutDowns = db;
            return getPage(db, collectionName, {}, {}, 5, function processPage(pagedDocs){console.log('do something with', pagedDocs);})
              .finally(db.close.bind(db));
          })
          .catch(function(err) {
            console.error("ERROR", err);
            dbHandleForShutDowns.close();
          });
    

    The following two sections show how the code manipulates _id and makes it part of the query:

     .sort({'_id':1}).limit(pageSize)
     // [...]
    var last_id = documents[documents.length-1]['_id'];
    query['_id'] = {'$gt' : last_id};
    

    Overall code flow:

    1. Let getPage() handle the work, you can set the pageSize and query to your liking:

      return getPage(db, collectionName, {}, {}, 5, function processPage(pagedDocs){console.log('do something with', pagedDocs);})
      
    2. Method signature:

      var getPage = function getPage(db, collectionName, query, projection, pageSize, processPage) {
      
    3. Process pagedResults as soon as they become available:

      return processPage(documents) // process the results of the current page
      
    4. Move on to the next page:

      return getPage(db, collectionName, query, projection, pageSize, processPage);
      
    5. The code will stop when there is no more data left:

      // stop - no data left to traverse
      return Promise.resolve();
      
    6. Or it will stop when working on the last page of data:

      // stop - last page
      return processPage(documents);
      

    I hope this offers some inspiration, even if its not an exact solution for your needs.