htmlnode.jspuppeteerwebcamgoogle-chrome-headless

Camera doesn't turn on until manually page is clicked using Puppeteer


I wrote a NodeJS application which performs the following things:

  1. Creates a server which is capable of serving some static files using http.createServer function
  2. Next it starts a puppeteer process to launch Chrome browser (I had to turn off headless to debug). The browser is instructed to launch a localhost URL which is running on the server created at step 1. Puppetter this emulates a click action on a button present on the page.
  3. The webpage (index.html) from localhost contains some face-api.js implementation. It has a button. When puppeteer click the button, it is expected to turn on the camera, start video streaming and perform some face recognition operation on the captured stream/image. At last it should update a div on the webpage with results.
  4. Puppeteer finally waits for the div selector and when present, reads the contents of the div and prints in NodeJS apps's console.

The problem I am encountering is - everything goes fine until the step when the camera should turn on. The button click event gets triggered fine. But the execution halts on await navigator.mediaDevices.getUserMedia() call. The camera doesn't turn on until atleast some manual user interaction is done on the opened web page in the browser. So, even if I click on ay part of the body of the page, it resumes the rest of the JS execution (i.e. turning on the camera and do the face recognition steps) and finishes the process successfully.

I'm giving the code of my index.js file which contains the NodeJS code and the index.html file which contains the page layout and the necessary JS code required for face-api.js Face recognition. Can someone please tell me why the browser is expecting some user interaction to the web page until turning on the camera? It seems to be security feature, but I am wondering is there a way to bypass it using Puppeteer? My final goal is to go headless and not show the Chrome window at all when running the NodeJS program.

index.js Code

const puppeteer = require('puppeteer');
const http = require('http');
const fs = require('fs');
require('dotenv').config();


const getViewUrl = (url) => {
    url = url == '/' ? 'index.html' : url;
    url = url.indexOf('/') === 0 ? url.substring(1) : url;
    return `public/${url}`;
};

const getContentType = (url) => {
    if (url.endsWith('.js')) {
        return 'text/javascript';
    } else if (url.endsWith('.json')) {
        return 'application/json';
    } else if (url.endsWith('.html')) {
        return 'text/html';
    }
    return 'application/octet-stream';
}

var server = null;
const PORT = process.env.PORT || 55193;

function startServer() {
    server = http.createServer((request, response) => {
        let viewUrl = getViewUrl(request.url);
        fs.readFile(viewUrl, (error, data) => {
            if (error) {
                response.writeHead(404);
                response.write("<h1>File Not Found</h1>")
            } else {
                response.writeHead(200, {
                    'Content-type': getContentType(viewUrl)
                });
                response.write(data);
            }
            response.end();
        })
    });

    server.listen(PORT);
}

(async() => {
    startServer();
    // Launch the browser and open a new blank page
    const browser = await puppeteer.launch({
        headless: false,
        dumpio: true,
        args: ['--no-sandbox',
            '--use-file-for-fake-video-capture=C:/Users/adm/Downloads/test.mjpeg'
        ]
    });
    var context = browser.defaultBrowserContext();
    context.clearPermissionOverrides();
    await context.overridePermissions("http://localhost:" + PORT + "/", ['camera', 'microphone']);
    const page = await context.newPage();

    page
        .on('console', message =>
            console.log(`${message.type().substr(0, 3).toUpperCase()} ${message.text()}`))
        .on('pageerror', ({ message }) => console.log(message))
        .on('response', response =>
            console.log(`${response.status()} ${response.url()}`))
        .on('requestfailed', request =>
            console.log(`${request.failure().errorText} ${request.url()}`));

    // Navigate the page to a URL
    await page.goto('http://localhost:' + PORT + '/', { waitUntil: 'load' });

    // Set screen size
    await page.setViewport({ width: 1080, height: 1024 });

    // Wait and click on first result
    const searchResultSelector = await page.waitForSelector('#runBtn');
    await page.click('#runBtn');

    // Locate the full title with a unique string
    const textSelector = await page.waitForSelector('.positionDiv');
    const fullTitle = await textSelector.evaluate(el => el.textContent);
    await browser.close();
    console.log(fullTitle);
    server.close();
})();

index.html code

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no" />
    <meta name="author" content="Prithwiraj Bose <sribasu.com>" />
    <title>FaceAPI</title>
    <script src="js/face-api.min.js" type="text/javascript"></script>
</head>

<body>

    <div id="content">
        <div id="myDiv01">...</div><br>

        <input type="button" value="run" id="runBtn" onclick="javascript: run();"><br><br>

        <video onplay="onPlay(this)" id="inputVideo" autoplay muted width="640" height="480" style=" border: 1px solid #ddd;"></video><br>
        <canvas id="overlay" width="640" height="480" style="position:relative; top:-487px; border: 1px solid #ddd;"></canvas><br>
    </div>
    <!-- Core theme JS-->
    <script type="text/javascript">
        function resizeCanvasAndResults(dimensions, canvas, results) {
            const {
                width,
                height
            } = dimensions instanceof HTMLVideoElement
                ?
                faceapi.getMediaDimensions(dimensions) :
                dimensions
            canvas.width = width
            canvas.height = height

            return results
        }

        async function onPlay() {
            const videoEl = document.getElementById('inputVideo')
            const options = new faceapi.TinyFaceDetectorOptions({
                inputSize: 128,
                scoreThreshold: 0.3
            })


            result = await faceapi.detectSingleFace(videoEl, options).withFaceLandmarks(true)
            if (result) {

                var nose = result.landmarks.getNose();
                var x = nose[3]._x;
                document.getElementById('myDiv01').innerHTML = x > (640 / 2) ? (x > 500 ? 'Extreme Right' : 'Right') : (x < 100 ? 'Extreme Left' : 'Left');
                document.getElementById('myDiv01').classList.add('positionDiv');
                // Just printing the first of 68 face landmark x and y 


            }

            setTimeout(() => onPlay())
        }

        async function run() {
            await faceapi.loadTinyFaceDetectorModel('models/')
            await faceapi.loadFaceLandmarkTinyModel('models/')
            console.log("Step 1");
            const stream = await navigator.mediaDevices.getUserMedia({
                audio: false,
                video: true
            })
            console.log("Step 2");
            const videoEl = document.getElementById('inputVideo')
            videoEl.srcObject = stream
        }
    </script>
</body>

</html>

NodeJS Console Output (until manually page is clicked)

C:\Program Files\nodejs\node.exe .\index.js
200 http://localhost:8080/
index.js:67
200 http://localhost:8080/js/face-api.min.js
index.js:67
200 http://localhost:8080/favicon.ico
index.js:67
200 http://localhost:8080/models/tiny_face_detector_model-weights_manifest.json
index.js:67
200 http://localhost:8080/models/tiny_face_detector_model-shard1
index.js:67
200 http://localhost:8080/models/face_landmark_68_tiny_model-weights_manifest.json
index.js:67
200 http://localhost:8080/models/face_landmark_68_tiny_model-shard1
index.js:67
LOG Step 1

NodeJS Console Output (after manually page is clicked)

LOG Step 2
index.js:64
Right

I've tried forcefully simulating a click on the page body using puppeteer. But nothing working until a real human interaction happens on the web page!

A short video of how it looks


Solution

  • I finally got it working. Yes, I learnt that it's due to security feature of Chrome that the real UI can't be programmatically interacted without any real human intervention. So there is a argument supported by Chrome (ref here) which can be passed to Puppetter. It's called --use-fake-ui-for-media-stream. So my browser launch code looks like this now. Everything else in the original code given in my question worked as expected.

        const browser = await puppeteer.launch({
          headless: "new",
          args: [
            '--no-sandbox',
            '--use-fake-ui-for-media-stream'
          ]
        });