javascripthtmldomhtml-parsing

Parse an HTML string with JS


I want to parse a string which contains HTML text. I want to do it in JavaScript.

I tried the Pure JavaScript HTML Parser library but it seems that it parses the HTML of my current page, not from a string. Because when I try the code below, it changes the title of my page:

var parser = new HTMLtoDOM("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>", document);

My goal is to extract links from an HTML external page that I read just like a string.

Do you know an API to do it?


Solution

  • Create a dummy DOM element and add the string to it. Then, you can manipulate it like any DOM element.

    var el = document.createElement( 'html' );
    el.innerHTML = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";
    
    el.getElementsByTagName( 'a' ); // Live NodeList of your anchor elements
    

    Edit: adding a jQuery answer to please the fans!

    var el = $( '<div></div>' );
    el.html("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>");
    
    $('a', el) // All the anchor elements