javascriptnode.jsweb-scrapingnode-request

Call a Request function from outside the request


Im trying to make a webscraper(educational puposes), and I got really far, but this little issue is bugging me.

I made a request callback function, and im trying to get lines 75-78 to work. However to get this to work, I need PDF_LISTS and PDF_LINKS to initilaze to the right values.

I've already tried to make them global variables, and what not, for some reason that doesnt work. So my question is: How do I make a callback function that will call that for loop (75-78) and succesfully initilaze PDF_LISTS and PDF_LINKS to the correct values ?

(Dont worry I use this on educational content, with the prof's permission). First time posting here!

// URL_LINKS has the pdf links of the pages
PDF_LINKS = [];
// URL_LIST has the names of the pdf links 
PDF_LIST = [];

function fillPDF(callback) {
    request(url, function(err, res, body) {
        $ = cheerio.load(body);
        links = $('a'); //jquery get all hyperlinks
        $(links).each(function(i, link) {

            var value = $(link).attr('href');
            // creates objects to hold the file 

            if (value.substring(value.length - 3, value.length) == "pdf") {
                PDF_LINKS[i] = $(link).attr('href');
                PDF_LIST[i] = $(link).text();
            }
        })
    });
}
// must decleare fillPDF variable or else you wont initilze teh variables

fillPDF() {
    //HERE I WANT PDF_LINKS and PDF_LIST to be intialized to 33.....
}
for (j = 0; j < PDF_LIST.length; j++) {
    request(PDF_LINKS[j]).pipe(fs.createWriteStream(PDF_LIST[j]));
}


Solution

  • You may push your values into arrays using array's push method, avoiding array's element to be undefined.

    You can put your final for loop into a function, and then use fillPDF();

    You also need to call fillPDF's callback once the request is over.

    PDF_LINKS = [];
    PDF_LIST = [];
    
    function fillPDF(callback) {
        request(url, function(err, res, body) {
            $ = cheerio.load(body);
            links = $('a');
            $(links).each(function(i, link) {
                var value = $(link).attr('href');
                if (value.slice(-3) == "pdf") {
                    PDF_LINKS.push(value);
                    PDF_LIST.push($(link).text());
                }
            })
            callback();
        });
    }
    
    function writePDF() {
        for (j = 0; j < PDF_LIST.length; j++) {
            request(PDF_LINKS[j]).pipe(fs.createWriteStream(PDF_LIST[j]));
        }
    }
    
    fillPDF(writePDF);