For research purposes, I'd like to list all the packages that are available on npm. How can I do this?
Some old docs at https://github.com/npm/registry/blob/master/docs/REGISTRY-API.md#get-all mention an /-/all
endpoint that presumably once worked, but http://registry.npmjs.org/-/all now just returns {"message":"deprecated"}
.
If you're happy to use data that's up to 24 hours out of date and provided by a third party, you can use all-the-package-names
. This npm package, updated daily, literally just exports a giant flat array of package names. (The same org and maintainer also publish all-the-package-repos
, which additionally has links to each package's GitHub repo. Their other packages for analysing the npm registry have been unmaintained and dead for years as far as I can tell, which is sad.)
If you want to do it yourself, that is possible too. At https://replicate.npmjs.com is an API that is kind of like a CouchDB API (once, it really was a CouchDB API) but with loads of stuff disabled. We need the _all_docs
endpoint. It was once possible (back when I wrote the original version of this answer) to simply hit this endpoint with no query string and get back a single giant response with all the packages in the registry, but no longer; if you try that today, the API will terminate the connection well before you get a complete list of packages. Instead, we need to paginate using the limit
and startkey
parameters. Here is a simple Node.js script to do that:
import fs from 'node:fs';
// Max value according to https://github.com/orgs/community/discussions/152515
const LIMIT = 10000;
const initialResponse = await fetch(
`https://replicate.npmjs.com/registry/_all_docs?limit=${LIMIT}`,
// Header below only needed temporarily until May 29th 2025.
// See https://github.com/orgs/community/discussions/152515
// Essentially Microsoft/GitHub/npm is doing an API migration (mostly
// disabling stuff that used to work), and is periodically browning out the
// old, more fully-featured API. This header opts into the new,
// less-fully-featured API that at least isn't offline half the time.
// From May 29th, this will be the default (and the only option).
{headers:{'npm-replication-opt-in': 'true'}}
);
const result = (await initialResponse.json()).rows;
console.log(`Fetched initial ${result.length} packages`);
while (true) {
const lastKey = result[result.length - 1].key;
const params = new URLSearchParams({
limit: LIMIT,
startkey: JSON.stringify(lastKey),
});
const resp = await fetch(
`https://replicate.npmjs.com/registry/_all_docs?${params}`,
// Again, remove this line after May 29th 2025:
{headers:{'npm-replication-opt-in': 'true'}}
)
const respJson = await resp.json();
const respRows = respJson.rows;
// The startkey parameter is inclusive, so the first row we get should be the
// same as the last row from the previous page. If the replicate.npmjs.com
// API were a real CouchDB API, we could pass skip=1 to skip that duplicate
// row, but it isn't, and doesn't support that parameter, so we just need to
// ignore the first row. We sanity-check it's as expected, first:
if (respRows[0].key !== lastKey) {
throw `Expected first row of request to have key ${lastKey} but it was ${respRows[0].key}`
}
if (respRows.length === 1) {
// We're done! There are no more packages.
break;
}
for (const row of respRows.slice(1)) {
result.push(row);
}
console.log(
`Reached offset ${respJson.offset} of ${respJson.total_rows} total rows.`
)
}
console.log("Finished! Writing to allpackages.json ...");
fs.writeFileSync("allpackages.json", JSON.stringify(result));
The JSON file output by the script above will be an array of objects like this, where the id
and key
are both the package name:
{"id":"lodash","key":"lodash","value":{"rev":"634-9273a19c245f088da22a9e4acbabc213"}},
At the moment in time that I am rewriting this answer, on 14th May 2025, there are 3542583 packages in that response and, on my fibre internet in the UK, each request for a batch of 10000 takes around 5 seconds, for a total download time of around half an hour. The resulting file is 377MB.
The 2025 rework of the API removes the ability to pass the include_docs
parameter to the _all_docs
API in order to retrieve metadata about packages in bulk. Instead, for most metadata, you have to make one request per package. For instance, for metadata about react
, like its description and release history, you'd hit https://registry.npmjs.org/react. There are some unofficial docs about the https://registry.npmjs.org API at https://www.edoardoscibona.com/exploring-the-npm-registry-api.
Yes, this does imply that to download metadata about all packages in the registry, you need to make (as of May 2025) 3.5 million requests. Below is a script to do it. It's somewhat crude (and doesn't handle every imaginable error scenario gracefully), but probably good enough:
import fs from "node:fs";
import path from "node:path";
const MAX_SIMULTANEOUS_REQUESTS = 50;
const packages = JSON.parse(fs.readFileSync("allpackages.json").toString()).reverse();
async function startFetcherThread() {
while (packages.length > 0) {
// Log progress every 1000 packages:
if (packages.length % 1000 == 1000 - MAX_SIMULTANEOUS_REQUESTS) {
console.log(new Date(), `${packages.length + MAX_SIMULTANEOUS_REQUESTS} packages to go`);
}
const pkg = packages.pop();
const packageName = pkg.key;
if (packageName.split('/').includes('.') || packageName.split('/').includes('..')) {
console.log(`Skipping ${packageName} because it is playing silly buggers in its package name`);
}
const outputPath = "metadata/" + packageName;
if (fs.existsSync(outputPath)) {
// Presumably we downloaded this on a previous run that we aborted. Skip.
continue;
}
let resp;
try {
resp = await fetch(`https://registry.npmjs.org/${packageName}`);
} catch (e) {
console.error(`Failed to fetch ${packageName}`);
continue;
}
if (resp.status !== 200) {
console.error(`Got ${resp.status} when trying to get ${packageName}`);
continue;
}
const respJson = await resp.json();
await fs.promises.mkdir(path.dirname(outputPath), {recursive: true});
await fs.promises.writeFile(outputPath, JSON.stringify(respJson));
}
}
for (let i = 0; i < MAX_SIMULTANEOUS_REQUESTS; i++) {
startFetcherThread();
}
I haven't tried other values of MAX_SIMULTANEOUS_REQUESTS
; as such, I don't know if this is optimal in any sense, nor if a bigger number will lead to hitting a rate limit. I can say that when I ran this script on my machine, for the first few thousand packages I was chewing through 1000 every 11 seconds or so, but later it dropped to a speed of about 35 seconds per 1000 packages. You can therefore expect a total runtime of over 1 day.
If you want download counts, for instance because you want to target some analysis at the top 100 or top 1000 most-downloaded packages, you can get those from an API sort-of-documented at https://github.com/npm/registry/blob/main/docs/download-counts.md. (Note that that entire repo of documentation is officially an archive, and much of what is in REGISTRY-API.md and REPLICATE-API.md are obsolete, but at the time of me writing this, the docs about download counts appear to me to still be correct and up to date.)