The Nokogiri #content
method does not convert block elements into paragraphs; for example:
fragment = 'hell<span>o</span><p>world<p>I am Josh</p></p>'
Nokogiri::HTML(fragment).content
=> "helloworldI am Josh"
I would expect output:
=> "hello\n\nworld\n\nI am Josh"
How to convert html to text ensuring that block elements result in line breaks and inline elements are replaced with no space?
You can use #before
and #after
to add newlines:
doc.search('p,div,br').each{ |e| e.after "\n" }