htmlrubynokogiri

Ruby Nokogiri::XML::SyntaxError Tag figure invalid, but tag appears to be free of errors


Ruby's Nokogiri keeps complaining about HTML that looks perfectly fine to me. Below's an example of the affected HTML and the full error message.

<div class="media">
<figure id="post-image-figure-8" class="post-image-figure" style="max-width: 50%; ">
<img src="/file8-thumbl.jpg" class="post-image img-no-border" id="post-image-8" style="cursor: pointer; max-width: 100%;" onclick="openPostIMG(8)" data-url="/file8.jpg" data-width="960" data-height="540" data-id="8">
</figure>
</div>

/usr/lib/x86_64-linux-gnu/rubygems-integration/3.1.0/gems/nokogiri-1.13.10/lib/nokogiri/html4/document.rb:220:in ``read_memory': Parser without recover option encountered error or warning: 4:107: ERROR: Tag figure invalid (Nokogiri::XML::SyntaxError)

Tried to check with my browser's developer tools, but those didn't complain about any malformed HTML.


Solution

  • I fixed the issue by replacing node = Nokogiri::HTML.fragment(html) with node = Nokogiri::HTML5.fragment(html)