I'm using sails 0.11.2. With the latest sails-mongo adapter. I have a very large database (gigabytes of data) of mainly timestamp and values. And i make queries on it using the blueprint api.
If I query using localhost:1337/datatable?limit=100000000000 the nodejs hangs on 0.12 with a lot of CPU usage, and crashes on v4. It crashes on the toJSON function.
I've finded out that i need to make multiple queries on my API. But I don't how to proceed to make it.
How can i make multiple queries that "don't explode" my server?
Update:
On newer version 0.12.3 with latest waterline and sails-mongo, the queries goes much smoother. The crashes on the cloud was that I didn't had enough RAM to handle sailsjs and mongodb on same T2.micro instance.
I've moved the mongodb server to a M3.Medium instance. And now the server don't crash anymore, but it freezes. I'm using skip limit and it works nicely for sails.js but for mongodb is a great waste of resources!
Mongodb make an internal query using limit = skip + limit. and then moves the cursor to the desired data and returns. When you are making a lot's in pagination you are using lots of internal queries. As the query size will increase.
As this article explains, the way to get around the waste of resources in MongoDB is to avoid using skip
and cleverly use _id
as part of your query.
I did not use sails mongo but I did implement the idea above by using mongo driver in nodejs:
/**
* Motivation:
* Wanted to put together some code that used:
* - BlueBird (promises)
* - MongoDB NodeJS Driver
* - and paging that did not rely on skip()
*
* References:
* Based on articles such as:
* https://scalegrid.io/blog/fast-paging-with-mongodb/
* and GitHub puclic code searches such as:
* https://github.com/search?utf8=%E2%9C%93&q=bluebird+MongoClient+_id+find+limit+gt+language%3Ajavascript+&type=Code&ref=searchresults
* which yielded smaple code hits such as:
* https://github.com/HabitRPG/habitrpg/blob/28f2e9c356d7053884107d90d04e28dde75fa81b/migrations/api_v3/coupons.js#L71
*/
var Promise = require('bluebird'); // jshint ignore:line
var _ = require('lodash');
var MongoClient = require('mongodb').MongoClient;
var dbHandleForShutDowns;
// option a: great for debugging
var logger = require('tracer').console();
// option b: general purpose use
//var logger = console;
//...
var getPage = function getPage(db, collectionName, query, projection, pageSize, processPage) {
//console.log('DEBUG', 'filter:', JSON.stringify(query,null,2));
projection = (projection) ? projection['_id']=true : {'_id':true};
return db
.collection(collectionName)
.find(query)
.project(projection)
.sort({'_id':1}).limit(pageSize)
.toArray() // cursor methods return promises: http://mongodb.github.io/node-mongodb-native/2.1/api/Cursor.html#toArray
.then(function processPagedResults(documents) {
if (!documents || documents.length < 1) {
// stop - no data left to traverse
return Promise.resolve();
}
else {
if (documents.length < pageSize) {
// stop - last page
return processPage(documents);
}
else {
return processPage(documents) // process the results of the current page
.then(function getNextPage(){ // then go get the next page
var last_id = documents[documents.length-1]['_id'];
query['_id'] = {'$gt' : last_id};
return getPage(db, collectionName, query, projection, pageSize, processPage);
});
}
}
});
};
//...
return MongoClient
.connect(params.dbUrl, {
promiseLibrary: Promise
})
.then(function(db) {
dbHandleForShutDowns = db;
return getPage(db, collectionName, {}, {}, 5, function processPage(pagedDocs){console.log('do something with', pagedDocs);})
.finally(db.close.bind(db));
})
.catch(function(err) {
console.error("ERROR", err);
dbHandleForShutDowns.close();
});
The following two sections show how the code manipulates _id
and makes it part of the query:
.sort({'_id':1}).limit(pageSize)
// [...]
var last_id = documents[documents.length-1]['_id'];
query['_id'] = {'$gt' : last_id};
Overall code flow:
Let getPage()
handle the work, you can set the pageSize
and query
to your liking:
return getPage(db, collectionName, {}, {}, 5, function processPage(pagedDocs){console.log('do something with', pagedDocs);})
Method signature:
var getPage = function getPage(db, collectionName, query, projection, pageSize, processPage) {
Process pagedResults
as soon as they become available:
return processPage(documents) // process the results of the current page
Move on to the next page:
return getPage(db, collectionName, query, projection, pageSize, processPage);
The code will stop when there is no more data left:
// stop - no data left to traverse
return Promise.resolve();
Or it will stop when working on the last page of data:
// stop - last page
return processPage(documents);
I hope this offers some inspiration, even if its not an exact solution for your needs.