I'm trying to get an XHR response from a webpage. I found the
await page.waitForResponse(url);
or
await page.waitForResponse((res) => {
if (res.url() === myUrl) return true;
});
method, but it always timeout for the url response I'm trying to get.
However, if I set
page.on('response', (res) => {
if (res.url() === myUrl) {
// do what I want with the response
}
})
the correct response is found and I can retrive the data.
After some debugging, seems like waitForResponse() isn't returning any XHR req/res.
Any ideias?
EDIT:
Example. For this case, its required to use puppeteer-extra-plugin-stealth
and puppeteer-extra
package, otherwise, this URL will return status code '403':
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import UserAgent from 'user-agents';
import puppeteer from 'puppeteer-extra';
import { Page } from 'puppeteer';
const wantedUrl = 'https://www.nike.com.br/DataLayer/dataLayer';
const workingFunction = async (page: Page) => {
let reqCount = 0;
let resCount = 0;
page.on('request', req => {
reqCount++;
if (req.url() == wantedUrl) {
console.log('The request I need: ', req.url());
console.log(reqCount);
}
});
page.on('response', async res => {
resCount++;
if (res.url() == wantedUrl) {
console.log('The response I need:', await res.json());
console.log(resCount);
}
});
await page.goto('https://www.nike.com.br/tenis-nike-sb-dunk-low-pro-unissex-153-169-229-284741', {
timeout: 0,
});
};
const notWorkingFunction = async (page: Page) => {
let resCount = 0;
await page.goto('https://www.nike.com.br/tenis-nike-sb-dunk-low-pro-unissex-153-169-229-284741');
const res = await page.waitForResponse(
res => {
resCount++;
console.log(res.url());
console.log(resCount);
if (res.url() === wantedUrl) {
return true;
}
return false;
},
{ timeout: 0 }
);
return res;
};
(async () => {
puppeteer.use(StealthPlugin());
const browser = await puppeteer.launch({});
const page = await browser.newPage();
const userAgent = new UserAgent({ deviceCategory: 'desktop' });
await page.setUserAgent(userAgent.random().toString());
try {
// workingFunction(page);
const res = await notWorkingFunction(page);
} catch (e) {
console.log(e);
}
})();
The reason the page.on
version works is because it sets the request/response handlers before performing navigation. On the other hand, the waitForResponse
version waits until the "load"
event fires (page.goto()
's default resolution point), and only then starts tracking responses with the call to page.waitForResponse
. MDN says of the load
event:
The
load
event is fired when the whole page has loaded, including all dependent resources such as stylesheets and images. This is in contrast toDOMContentLoaded
, which is fired as soon as the page DOM has been loaded, without waiting for resources to finish loading.
Based on this, we can infer that by the time the load
event fires and the waitForResponse
function finally starts listening to traffic, it's already missed the desired response, so it just waits forever!
The solution is to create the promise for page.waitForResponse
before (or at the same time as) the goto
call such that no traffic is missed when you kick off navigation.
I also suggest using "domcontentloaded"
on the goto
call. "domcontentloaded"
is underused in Puppeteer -- there's no sense in waiting for all resources to arrive when you're just looking for one. The default "load"
or often-used "networkidleN"
settings are better for use cases like screenshotting the page where you want the whole thing to look like it does as a user would see it. To be clear, this isn't the fix to the problem, just an optimization, and it's not too apparent from the docs which is suitable when.
Here's a minimal example (I used JS, not TS):
const puppeteer = require("puppeteer-extra"); // ^3.2.3
const StealthPlugin = require("puppeteer-extra-plugin-stealth"); // ^2.9.0
const UserAgent = require("user-agents"); // ^1.0.958
puppeteer.use(StealthPlugin());
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
const userAgent = new UserAgent({deviceCategory: "desktop"});
await page.setUserAgent(userAgent.random().toString());
const url = "https://www.nike.com.br/tenis-nike-sb-dunk-low-pro-unissex-153-169-229-284741";
const wantedUrl = "https://www.nike.com.br/DataLayer/dataLayer";
const [res] = await Promise.all([
page.waitForResponse(res => res.url() === wantedUrl, {timeout: 90_000}),
page.goto(url, {waitUntil: "domcontentloaded"}),
]);
console.log(await res.json());
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
(Note that the site has changed since the time this was posted--the code no longer works, but the fundamental ideas still apply)