javascriptnode.jsconcurrencyq

Limit Q promise concurrency in Node js


Is there any way to limit the number of concurrent Q promises to be executed at once in node js?

I am building a web scraper , which must request and parse more 3000+ pages and without throttle some of the requests i make aren't responded to on time, so the connection rests and the needed response (html code) becomes unavailable.

To counter act this, i found that limiting the number of requests my problem goes away.


I have tried the following methods but to no avail:

I need to request an array of urls, doing only 1 request at a time and when all urls in the array have completed, then return the results in a array.

function processWebsite() {
  //computed by this stage
  urls = [u1,u2,u3,u4,l5,u6,u7,u8,u9];

  var promises = throttle(urls,1,myfunction);

  // myfunction returns a Q promise and takes a considerable 
  // amount of time to resolve (approximately 2-5 minutes)
  
  Q.all(promises).then(function(results){
      //work with the results of the promises array
  });
}

Solution

  • You can request a new url in a then() block

    myFunction(urls[0]).then(function(result) {
      myFunction(urls[1]).then(function(result) {
        myFunction(urls[2]).then(function(result) {
          ...
        });
      });
    });
    

    Of course, this would be its dynamic behaviour. I'd mantain a queue and dequeue a single url once a promise is resolved. Then make another request. And perhaps have a hash object relating urls to results.

    A second take:

    var urls = ...;
    var limit = ...;
    var dequeue = function() {
      return an array containing up to limit
    };
    
    var myFunction = function(dequeue) {
      var urls = dequeue();
    
      $q.all(process urls);
    };
    
    myFunction(dequeue).then(function(result) {
      myFunction(dequeue).then(function(result) {
        myFunction(dequeue).then(function(result) {
          ...
        });
      });
    });