ruby-on-railsruby-on-rails-3html-to-text

Convert HTML to a proper plain text?


is there any way I can convert HTML into proper plain text? I tried everything from raw to sanitize and even the Mail gem with it's text_part method which is supposed to do exactly that but doesn't work for me.

My best shot so far was strip_tags(strip_links(resource.body)) but <p>, <ul> etc. were not correctly converted.

This is more or less what I have in HTML:

Hello

This is some text. Blah blah blah.

Address:
John Doe
10 ABC Street
Whatever City

New Features
- Feature A
- Feature B
- Feature C
Check this out: http://www.google.com

Best,
Admin

which converts to something like

Hello
This is some text. Blah blah blah.
Address: John Doe 10 ABC Street Whatever City

New Features Feature A Feature B Feature C
Check this out: http://www.google.com

Best, Admin

Any idea?


Solution

  • Found the solution here: https://github.com/alexdunae/premailer/blob/master/lib/premailer/html_to_plain_text.rb

    Works like a charm!