javascriptnode.jspuppeteer

How to use Puppeteer to download PDF files from a website?


I've been trying to use Puppeteer to download PDF files from a specific website but how do I get it to download all the files for example:

A file on the website is like example.com/Contents/xxx-1.pdf A second file on the website is like example.com/Contents/xxx-2.pdf

How can I use puppeteer to download the file contents automatically by trying for each number added?


Solution

  • I've made a function that given a function with an index as parameter, returns the url of the pdf to download and a count that limits the downloads, it tries to download the pdf.

    const puppeteer = require('puppeteer');
    
    
    downloadFiles((i) => `example.com/Contents/xxx-${i}.pdf`, 20);
    
    async function downloadFiles(url, count) {
        const browser = await puppeteer.launch({
            headless: false,
            args: ['--no-sandbox', '--disable-setuid-sandbox']
        });
        const page = await browser.newPage();
        for (let i = 0; i < count; i++) {
            const pageUrl = await url(i);
            try {
                await page.goto(pageUrl);
                await page.pdf({
                    path: `pdf-${i}.pdf`,
                    format: 'A4',
                    printBackground: true
                });
            } catch (e) {
                console.log(`Error loading ${pageUrl}`);
            }
        }
        await browser.close();
    }