htmlxhtmlproject-planningxhtml-1.0-strictbuzzword-compliance

What problem does XHTML strict solve?


I really don't understand the fascination with XHTML strict. Inline JavaScript typically requires a rats nest of escapes to make it compatible with XHTML and semi-backwards compatible with MSIE 5 & 6. Then there is the issue of not being OCD enough on user input to make sure you don't miss any illegal characters. It just seems like more effort then its worth. Nevermind that almost every developer I've worked along side of keeps forgetting to ensure the content-type returned from the server is reset for XHTML pages from text/html to application/xhtml+xml.

Wish I knew the name of the blogger, but someone else pointed out that a majority of supposedly XHTML compliant websites and open source packages are actually not because of that last issue, forgetting to set the content-type header correctly.

I'm looking to understand why XHTML is useful, or build enough of an arsenal of arguments to prevent it ever being used in future projects that I have influence on.


Solution

  • XHTML1 vs HTML4 and Strict vs Transitional are completely orthogonal issues.

    XML might not give any huge advantage to browsers today, but on the server end it's an order of magnitude easier to process documents using XML than trying to parse the mess that is old-school-SGML-except-not-really HTML4.

    Restricting yourself to [X]HTML Strict doesn't achieve anything in itself, other than simply that it discourages the use of old, less-maintainable techniques you shouldn't be using anyway.

    Inline javascript typically requires a rats nest of escapes to make it compatible with XHTML

    You can get away without any escapes as long as you don't use the characters < or &. And ‘// < [CDATA[’ isn't really much worse than ‘< !--’ was in the old days.

    In any case, keeping the scripting external is much more manageable; you don't want to be doing anything significant inline.

    Then there is the issue of not being OCD enough on user input to make sure you don't miss any illegal characters.

    Out-of-band characters are exactly as invalid in HTML4 Transitional as in XHTML1 Strict.

    If you're accepting user-submitted HTML and not checking/escaping it with enough of a fine tooth comb to prevent well-formedness errors you have much bigger problems than just complying with a doctype. You'll be letting injection hacks through and making your site vulnerable to cross-site-scripting security holes.

    forgetting to ensure the content-type returned from the server is reset for XHTML pages from text/html to application/html+xml.

    It's not ‘forgetting’, it's deliberate: there is not really that much point in serving application/xhtml+xml today. To account for IE you have to sniff UA, and then make sure you understand the CSS and JavaScript differences that pop up in both parsing modes... you can do it to prove your technical prowess, but it doesn't really get you anything.

    Serving XHTML as legacy HTML may not be ideal, but it lets you keep the simpler, more processable syntax of XML (and potential interoperability with other XML languages like SVG) whilst still being browser-friendly.

    People complain about the pickiness of the well-formedness errors, but having those errors picked up straight away for you to fix them is way better than leaving them there silently, ready to trip up some future browser.