Given an HTML string with multiple nested elements, how can I remove all font
elements using cheerio
while maintaining the internal contents of the innermost font
element?
For example this is the before
<body>
<font>
<font>
<font>
<p>Three fonts deep</p>
</font>
</font>
<font>
Two fonts deep
</font>
</font>
<font>
One font deep
</font>
No fonts deep
</body>
and this is the after
<body>
<p>Three fonts deep</p>
Two fonts deep
One font deep
No fonts deep
</body>
I have tried the unwrap
$('font').each((i, el) => {
$(el).unwrap();
});
and replaceWith
$('*').each((i, el) => {
if (el.name === 'font') {
$(el).replaceWith($(el).html());
}
});
and both of these only remove the outer layer of font. I suspect that altering the HTML while in the each
loop causes an issue?
If I run replacement in a while
loop it does work.
let foundFonts;
do {
foundFonts = 0;
$('font').each((i, el) => {
$(el).replaceWith($(el).html());
foundFonts++;
});
} while (foundFonts > 0);
I'm wondering, is there an efficient way to get rid of the font elements in a single pass?
Try iterating in reverse in your replaceWith
attempt:
const cheerio = require("cheerio"); // ^1.0.0-rc.12
const html = `<body>
<font>
<font>
<font>
<p>Three fonts deep</p>
</font>
</font>
<font>
Two fonts deep
</font>
</font>
<font>
One font deep
</font>
No fonts deep
</body>`;
const $ = cheerio.load(html);
[...$("font")]
.reverse()
.forEach(el => $(el).replaceWith($(el).html()));
console.log($.html())
Output:
<html><head></head><body>
<p>Three fonts deep</p>
Two fonts deep
One font deep
No fonts deep
</body></html>
Forward looping (root to leaf in the HTML tree) doesn't work because once you process the outermost <font>
, its subtree becomes freshly-created and the elements inside of it are no longer part of the tree being looped over.