Is there any way to limit the number of concurrent Q promises to be executed at once in node js?
I am building a web scraper , which must request and parse more 3000+ pages and without throttle some of the requests i make aren't responded to on time, so the connection rests and the needed response (html code) becomes unavailable.
To counter act this, i found that limiting the number of requests my problem goes away.
I have tried the following methods but to no avail:
I need to request an array of urls, doing only 1 request at a time and when all urls in the array have completed, then return the results in a array.
function processWebsite() {
//computed by this stage
urls = [u1,u2,u3,u4,l5,u6,u7,u8,u9];
var promises = throttle(urls,1,myfunction);
// myfunction returns a Q promise and takes a considerable
// amount of time to resolve (approximately 2-5 minutes)
Q.all(promises).then(function(results){
//work with the results of the promises array
});
}
You can request a new url in a then()
block
myFunction(urls[0]).then(function(result) {
myFunction(urls[1]).then(function(result) {
myFunction(urls[2]).then(function(result) {
...
});
});
});
Of course, this would be its dynamic behaviour. I'd mantain a queue and dequeue a single url once a promise is resolved. Then make another request. And perhaps have a hash object relating urls to results.
A second take:
var urls = ...;
var limit = ...;
var dequeue = function() {
return an array containing up to limit
};
var myFunction = function(dequeue) {
var urls = dequeue();
$q.all(process urls);
};
myFunction(dequeue).then(function(result) {
myFunction(dequeue).then(function(result) {
myFunction(dequeue).then(function(result) {
...
});
});
});