javahtmljava-7jtidy

Cleaning up Html5 pages with Java: Is it possible?


I need to clean up Html5 pages inside my Java project.

So I need a Java library, or a command line program working both on Linux and Windows.

JTidy doesn't work well (I tested it). HTML Tidy for HTML5 is a C++ Library and it's command line version works only on Linux.

Do you know if Validator.nu HTML Parser also cleans up (I didn't find any information about it)?

Have you any ideas?

Thanks


Solution

  • Use JSoup. Well supported, no native components (should run everywhere Java does), free-but-very-liberal license. Also, supports HTML5