It seems like many of the cheerio functions are unavailable. I have been able to use filter() and find(), but not extract. Even when following the exact tutorial.
npm install cheerio
import * as cheerio from 'cheerio';
const $ = cheerio.load(`
<ul>
<li>One</li>
<li>Two</li>
<li class="blue sel">Three</li>
<li class="red">Four</li>
</ul>
`);
// Extract the text content of the first .red element
const data = $.extract({
red: '.red',
});
console.log(data)
Result:
const data = $.extract({
^
TypeError: $.extract is not a function
at scrape.js:13:16
at ModuleJob.run (node:internal/modules/esm/module_job:218:25)
at async ModuleLoader.import (node:internal/modules/esm/loader:329:24)
at async loadESM (node:internal/process/esm_loader:28:7)
at async handleMainPromise (node:internal/modules/run_main:113:12)
Node.js v20.11.1
I'm not sure what's up with Cheerio's documentation, but apparently it includes features that haven't been released yet. See this comment and relevant thread.
So, yeah, this makes no sense but that's the answer: $.extract()
doesn't exist even though it's in the docs. Maybe it'll be added for real at some point, possibly the next major release.
In the meantime, consider using x-ray, scrape-it or muninn if you want this type of high-level, declarative scraping syntax. I haven't used any of these libraries yet though.
The Cheerio version I'm discussing is 1.0.0-rc.12, for posterity.
By the way, .fromURL()
is also documented but isn't supported, so this isn't an isolated occurrence. This comment confirms:
The website is the right place — the only missing bits in the current release are (1) the
extract
method, and (2) loading methods besides.load
.