node.jsnpmnpm-shrinkwrap

Using a shared npm node_modules/ for multiple workspaces on the same server


Our Jenkins/CI server runs hundreds of builds a day for our node/js project, and I would like to be able to run each build in a completely clean workspace. However, the npm install step can take >10 minutes, which is just far too slow. Instead, as our npm dependencies only change for a small number of builds (approx 1% of all builds), I'd like to only run npm install once every time npm-shrinkwrap.json changes (check md5sum at every build). If the shrinkwrap file hasn't changed, use a cached node_modules/ directory.

This plan works well enough if I copy the cached node_modules/, however even that operation can take up to a minute. To further optimize our build times, I would like to be able to symlink to the cached node_modules/, which should drastically improve our overall build performance.

ln -s /path/to/cache/ /path/to/workspace/node_modules

However, simply symlinking to the path of the cache doesn't work in a case where a dependency exists at multiple levels of the dependency tree. As an example, our top level project depends on both gulp and gulp-util. The top level dependency also depends on gulp-util. After an npm install, gulp-util will be install in the top level node_modules/ but not in node_modules/gulp/node_modules.

If the dependencies exist in the local workspace (i.e. a real directory /path/to/workspace/node_modules/), then any instance of require('gulp-util') within node_modules/gulp will (I think) recurse up the dependency tree until it finds an appropriate gulp-util module. That is, it starts off looking in /path/to/workspace/node_modules/gulp/node_modules/gulp-util, doesn't find anything, then looks in /path/to/workspace/node_modules/gulp-util, finds an appropriate module, imports it and moves on.

However, when this is a symlink I'll get an error like:

module.js:339
    throw err;
    ^

Error: Cannot find module 'gulp-util'
    at Function.Module._resolveFilename (module.js:337:15)
    at Function.Module._load (module.js:287:25)
    at Module.require (module.js:366:17)
    at require (module.js:385:17)
    at Object.<anonymous> (/path/to/cache/gulp/index.js:4:15)
    at Module._compile (module.js:435:26)
    at Object.Module._extensions..js (module.js:442:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:311:12)
    at Module.require (module.js:366:17)
    at require (module.js:385:17)

I assume that this tries to do the same as the other version, but I can't see why it fails to find gulp-util. Whether it looks in /path/to/workspace/node_modules/gulp-util or /path/to/cache/gulp-util, it should find the module and be able to import it.

I've tried resolving this by manually installing the module gulp/node_modules/gulp-util, but I encounter dozens of such errors and dealing with this manually on a build server is unfeasible. Writing some code to search for dependencies of this type and installing them is possible, but it feels like the wrong thing to do.

npm must have some way of supporting such a workflow, right? Am I missing something obvious? Have I glossed over something in the documentation?


Solution

  • Thanks to an answer posted by @amol-m-kulkarni here (helpfully referenced by @darko-rodic above), I realised my mistake.

    If the given module is not a core modules(e.g. http, fs, etc.), Node.js will then begin to search for a directory named, node_modules.

    It will start in the current directory (relative to the currently-executing file in Node.JS) and then work its way up the folder hierarchy, checking each level for a node_modules folder. Once Node.JS finds the node_modules folder, it will then attempt to load the given module either as a (.js) JavaScript file or as a named sub-directory; if it finds the named sub-directory, it will then attempt to load the file in various ways. So, for example

    My mistake was in the name I gave the cache path. As it was named /path/to/cache, require stopped looking upwards after the last node_modules dir it encountered in the path. In this case, it stopped at /path/to/cache/gulp/node_modules, and didn't realise that /path/to/cache/gulp-util should have been considered in the search.

    I fixed this by renaming the cache to /path/to/cache/node_modules so require will continue searching until it reaches that level, and will subsequently find /path/to/cache/node_modules/gulp-util.

    I'm going to refer to the docs again to see if this should have been clear to me or not.