phphtmlms-wordhtmlcleaner

Remove MS Word "HTML" using PHP


Possible Duplicate:
What is the best free way to clean up Word HTML?
PHP to clean-up pasted Microsoft input

I allow clients to enter notes in a rich text editor, and have only recently upgraded to ckEditor 3x, which strips MS word classes, styles, and comments by default (when users paste into the editor object). So moving forward I'm all set.

I've recently had a need to clean up 5 years worth of notes some of which have MS word generated HTML embedded. I need to loop through this body of text and clean it.

I do not need to strip out all span tags, only those identified as written by Microsoft.

I've tried using HTMLCleaner, but it is not removing the MS generated HTML. http://word2cleanhtml.com does exactly what I want, however the developers are currently not offering the API for public use (as of July 9, 2012).

I've looked for such a class off and on for the last few weeks and am not having much luck. Have any of you found a useful class you'd like to share?


Solution

  • http://htmlpurifier.org/

    This will do what you want.