node.jsphantomjscasperjsaws-lambdaspookyjs

How to deploy a phantomjs node app on AWS Lambda?


I threw together a small Lambda function together to crawl a website using the SpookyJS, CasperJS, and PhantomJS toolchain for headless browsing. The task is quite simple, and at some point a few months ago it was working in Lambda. I recently had to change a few things around and wanted to work on the project again, but started fresh and had trouble getting Lambda to run without erroring in any capacity. My question is how can I run phantomjs in Lambda?

The example code I am running is:

spooky.start('http://en.wikipedia.org/wiki/Spooky_the_Tuff_Little_Ghost');
spooky.then(function () {
    this.emit('hello', 'Hello, from ' + this.evaluate(function () {
        return document.title;
    }));
});
spooky.run();

The error I am getting in Lambda is:

{ [Error: Child terminated with non-zero exit code 1] details: { code: 1, signal: null } }

I have followed a variety of procedures to ensure everything is able to run on Lambda. Below is a long list of things I've attempted to diagnose:

  1. Run locally using node index.js and confirm it is working
  2. Upload package.json and the js file to an Amazon Linux EC2 instance for compilation as recommended for npm installation calls and described here
  3. Run npm install on the ec2 instance, and again run node index.js to ensure the correct output
  4. zip everything up, and deploy to AWS using the cli

My package.json is:

{
  "name": "lambda-spooky-test",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "casperjs": "^1.1.3",
    "phantomjs-prebuilt": "^2.1.10",
    "spooky": "^0.2.5"
  }
}

I have also attempted the following (most also working locally, and on the AWS EC2 instance, but with the same error on Lambda:

  1. Trying the non -prebuilt version of phantom
  2. Ensuring casperjs and phantomjs are accessible from the path with process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT'] + ':' + process.env['LAMBDA_TASK_ROOT'] + '/node_modules/.bin'; console.log( 'PATH: ' + process.env.PATH );
  3. Inspecting spawn calls by wrapping child_process's .spawn() call, and got the following:

    { '0': 'casperjs',
      '1': 
       [ '/var/task/node_modules/spooky/lib/bootstrap.js',
         '--transport=http',
         '--command=casperjs',
         '--port=8081',
         '--spooky_lib=/var/task/node_modules/spooky/lib/../',
         '--spawnOptions=[object Object]' ],
      '2': {} }
    
  4. Calling .exec('casperjs') and .exec('phantomjs --version') directly, confirming it works locally and on EC2, but gets the following error in Lambda. The command:

    `require('child_process').exec('casperjs', (error, stdout, stderr) => {
    if (error) { console.error('error: ' + error); }
       console.log('out: ' + stdout);
       console.log('err: ' + stderr);
    });
    

both with the following result:

err: Error: Command failed: /bin/sh -c casperjs
module.js:327
    throw err;
    ^

Error: Cannot find module '/var/task/node_modules/lib/phantomjs'
    at Function.Module._resolveFilename (module.js:325:15)
    at Function.Module._load (module.js:276:25)
    at Module.require (module.js:353:17)
    at require (internal/module.js:12:17)
    at Object.<anonymous> (/var/task/node_modules/.bin/phantomjs:16:15)
    at Module._compile (module.js:409:26)
    at Object.Module._extensions..js (module.js:416:10)
    at Module.load (module.js:343:32)
    at Function.Module._load (module.js:300:12)
    at Function.Module.runMain (module.js:441:10)

2016-08-07T15:36:37.349Z    b9a1b509-5cb4-11e6-ae82-256a0a2817b9    sout: 
2016-08-07T15:36:37.349Z    b9a1b509-5cb4-11e6-ae82-256a0a2817b9    serr: module.js:327
    throw err;
    ^

Error: Cannot find module '/var/task/node_modules/lib/phantomjs'
    at Function.Module._resolveFilename (module.js:325:15)
    at Function.Module._load (module.js:276:25)
    at Module.require (module.js:353:17)
    at require (internal/module.js:12:17)
    at Object.<anonymous> (/var/task/node_modules/.bin/phantomjs:16:15)
    at Module._compile (module.js:409:26)
    at Object.Module._extensions..js (module.js:416:10)
    at Module.load (module.js:343:32)
    at Function.Module._load (module.js:300:12)
    at Function.Module.runMain (module.js:441:10)

Solution

  • I found the issue to be that including the node_modules/.bin in the path works on both local and ec2 machines because those files simply point to the action /bin folders in each respective library. This breaks if calls within those files use relative paths. The issue:

    [ec2-user@ip-172-31-32-87 .bin]$ ls -lrt
    total 0
    lrwxrwxrwx 1 ec2-user ec2-user 35 Aug  7 00:52 phantomjs -> ../phantomjs-prebuilt/bin/phantomjs
    lrwxrwxrwx 1 ec2-user ec2-user 24 Aug  7 00:52 casperjs -> ../casperjs/bin/casperjs
    

    I worked around this by adding each library's respective bin to the lambda path in the Lambda handler function:

    process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT'] 
            + ':' + process.env['LAMBDA_TASK_ROOT'] + '/node_modules/phantomjs-prebuilt/bin'
            + ':' + process.env['LAMBDA_TASK_ROOT'] + '/node_modules/casperjs/bin';
    

    And this will now run phantom, casper, and spooky correctly in Lambda.