javascriptnode.jspuppeteer

Injecting HTML before script evaluation with puppeteer


I want to inject some HTML into a specific element on a page using puppeteer.

The HTML must be injected before any JavaScript is executed.

There are two ways I think I could do this:

  1. Inject HTML using page.evaluateOnNewDocument

This function is "is invoked after the document was created" but I can't access DOM elements from it. eg:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  page.on('console', consoleObj => console.log(consoleObj.text()));

  await page.evaluateOnNewDocument(
    () => {
      const content = document.querySelector('html');
      console.log(content);
    }
  );

  await page.goto(process.argv[2]);

  await browser.close();
})();

This script just outputs newlines when I visit a page.

  1. Using page.setJavaScriptEnabled to prevent the javascript from executing before I inject the HTML. As per the docs though, this doesn't start executing the javascript after I turn it back on. eg:

My script looks something like this:

const fs = require('fs');
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const html = fs.readFileSync('./example.html', 'utf8');

  await page.setJavaScriptEnabled(false)
  await page.goto(process.argv[2]);
  await page.evaluate(
    content => {
      const pageEl = document.querySelector('div.page');
      let node = document.createElement('div');
      node.innerHTML = content;
      pageEl.appendChild(node);
    }, html
  );
  await page.setJavaScriptEnabled(true)

  await browser.close();
})();

Alternatively, it may also be possible to do something like this, though that seems overly complex for what is a fairly simple request.

Is there an easier way to do this that I am overlooking?

Cheers


Solution

  • It appears that this is actually a very popular request and I perhaps should have searched more thoroughly before posting my question.

    Nevertheless, I settled on the solution proposed by aslushnikov here.

    The following code is just what I produced to test the idea, I'm sure there's significant room for improvement.

    I made a simple function to perform XHRs:

    const requestPage = async (url) => {
      return new Promise(function (resolve, reject) {
        let xhr = new XMLHttpRequest();
        xhr.open('GET', url);
        xhr.setRequestHeader('Ignore-Intercept', 'Value');
        xhr.onload = function () {
          if (this.status >= 200 && this.status < 300) {
            const response = {};
            xhr.getAllResponseHeaders()
              .trim()
              .split(/[\r\n]+/)
              .map(value => value.split(/: /))
              .forEach(keyValue => {
                  response[keyValue[0].trim()] = keyValue[1].trim();
              });
            resolve({ ...response, body: xhr.response });
          } else {
            reject({
                status: this.status,
                statusText: xhr.statusText
            });
          }
        };
        xhr.onerror = function () {
          reject({
              status: this.status,
              statusText: xhr.statusText
          });
        };
        xhr.send();
      });
    };
    

    I then exposed this function to the page.

    I then used this function to perform an XHR instead of allowing the request to go ahead and used the result of that as the response to the request.

    await page.setRequestInterception(true);
    page.on('request', async (request) => {
      if (
        request.url() === url
        && (
          typeof request.headers()['access-control-request-headers'] === 'undefined'
          || !request.headers()['access-control-request-headers'].match(/ignore-intercept/gi)
        ) && typeof request.headers()['ignore-intercept'] === 'undefined'
      ) {
        const response = await page.evaluate(`requestPage('${url}')`);
        response.body += "hello";
        request.respond(response);
      } else {
        request.continue();
      }
    });
    
    await page.goto(`data:text/html,<iframe style='width:100%; height:100%' src=${url}></iframe>`);
    

    Annoyingly, it didn't seem possible to use page.evaluate unless the desired page was in an iframe. (hence the await page.goto(`data:text/html....