javascripthtmlcheerio

Unwrapping multiple nested HTML elements with Cheerio


Given an HTML string with multiple nested elements, how can I remove all font elements using cheerio while maintaining the internal contents of the innermost font element?

For example this is the before

<body>
  <font>
    <font>
      <font>
        <p>Three fonts deep</p>
      </font>
    </font>
    <font>
      Two fonts deep
    </font>
  </font>
  <font>
    One font deep
  </font>
  No fonts deep  
</body>

and this is the after

<body>
  <p>Three fonts deep</p>
  Two fonts deep
  One font deep
  No fonts deep
</body>

I have tried the unwrap

$('font').each((i, el) => {
  $(el).unwrap();
});

and replaceWith

$('*').each((i, el) => {
  if (el.name === 'font') {
    $(el).replaceWith($(el).html());
  }
});

and both of these only remove the outer layer of font. I suspect that altering the HTML while in the each loop causes an issue?

If I run replacement in a while loop it does work.

let foundFonts;
do {
  foundFonts = 0;
  $('font').each((i, el) => {
    $(el).replaceWith($(el).html());
    foundFonts++;
  });
} while (foundFonts > 0);

I'm wondering, is there an efficient way to get rid of the font elements in a single pass?


Solution

  • Try iterating in reverse in your replaceWith attempt:

    const cheerio = require("cheerio"); // ^1.0.0-rc.12
    
    const html = `<body>
      <font>
        <font>
          <font>
            <p>Three fonts deep</p>
          </font>
        </font>
        <font>
          Two fonts deep
        </font>
      </font>
      <font>
        One font deep
      </font>
      No fonts deep  
    </body>`;
    
    const $ = cheerio.load(html);
    
    [...$("font")]
      .reverse()
      .forEach(el => $(el).replaceWith($(el).html()));
    
    console.log($.html())
    

    Output:

    <html><head></head><body>
      
        
          
            <p>Three fonts deep</p>
          
        
        
          Two fonts deep
        
      
      
        One font deep
      
      No fonts deep  
    
    </body></html>
    

    Forward looping (root to leaf in the HTML tree) doesn't work because once you process the outermost <font>, its subtree becomes freshly-created and the elements inside of it are no longer part of the tree being looped over.