I'm attempting to use Puppeteer to scrape about 300 webpages to PDF, but my loop isn't working. The intent is that Puppeteer loads each page from an array, generates a PDF, and then works through all of the URLs before closing.
Using the code below, Puppeteer successfully scrapes the first URL -- and then stops.
Code (URLs are placeholders):
const puppeteer = require('puppeteer');
(async () => {
// Create a browser instance
const browser = await puppeteer.launch({ headless: true });
// Create a new page
const page = await browser.newPage();
// Set viewport width and height
await page.setViewport({ width: 1280, height: 720 });
const urlArray = [
'https://ask.metafilter.com/369890/Patio-furniture-designed-for-the-PNW',
'https://ask.metafilter.com/369889/Its-the-police-should-I-document-my-concern',
'https://ask.metafilter.com/369888/Training-my-over-excited-dog'
];
for(var i = 0; i < urlArray.length; i++) {
const website_url = urlArray[i];
// Open URL in current page
await page.goto(website_url, { waitUntil: 'networkidle0' });
// Download the PDF
const pdf = await page.pdf({
path: 'images/page_${i+1}.pdf',
margin: { top: '100px', right: '50px', bottom: '100px', left: '50px' },
printBackground: true,
});
}
// Close the browser instance
await browser.close();
})();
However, if I attempt to create a screenshot, swapping out this:
// Download the PDF
const pdf = await page.pdf({
path: 'images/page.pdf',
margin: { top: '100px', right: '50px', bottom: '100px', left: '50px' },
printBackground: true,
});
For this:
// Capture screenshot
await page.screenshot({
path: `images/screenshot_full_${i+1}.jpg`,
fullPage: true
});
It loops fine, and goes through every URL in the array.
What am I missing?
I'm working from these tutorials: https://www.bannerbear.com/blog/how-to-make-a-pdf-from-html-with-node-js-and-puppeteer/, https://www.bannerbear.com/blog/how-to-take-screenshots-with-puppeteer/
As @ggorlen pointed out, I was using single quotation marks where I should have had backticks:
path: 'images/page_${i+1}.pdf'
Should be:
path: `images/page_${i+1}.pdf`