Just yesterday me, neonDragon and a few others were having a chat about XHTML and HTML in irc.neondragon.net #evolution about whether we should use XHTML 1.0 or HTML 4.01.
I caught the XHTML bug around 2 years ago so I actually haven’t written a line of HTML for about that long. I’m using XHTML 1.0 Transitional almost exclusively across all of my sites, sometimes XHTML 1.0 Strict. I’m quite happy with XHTML but it’s true that there haven’t been many benefits.
If you visited this website several weeks ago, you probably wouldn’t have noticed it was sent to you using the application/xhtml+xml mime type providing your browser supported it. For Internet Explorer users, you got the text/html MIME type. This worked fine but sometimes problems could be caused by an non-encoded ampersand or something. Before I added the WYSIWYG editor, I removed the application/xhtml+xml MIME Type because I couldn’t guarantee bulletproof XHTML.
The big problem is this: I can’t guarantee the user submits valid XHTML. Markdown did a pretty good of generating XHTML but it wasn’t perfect. With the application/xhtml+xml MIME type, all it takes for a user to break a web page is to submit content with an unencoded ampersand. If you want to be so dramatic, call it a "denial of service".
I do wonder whether XML_HTMLSax could ensure bulletproof XHTML but I’ve never really tried it. The SafeHTML library I use actually converts ensures tags are closed and stuff as an side-effect but it misses things such as unencoded ampersands.
In the last year or so, HTML 4 has become more popular again. Hixie’s complaints about MIME types (Sending XHTML as text/html considered harmful) is probably one of the most frequently quoted articles on the HTML VS. XHTML debate. Anne van Kesteren writes today on the Rise of HTML:
I guess people have noticed that XHTML 1 is not forwards compatible. At least not with XHTML 2 which is not backwards compatible. HTML 5 on the other hand has a clear migration path. Or perhaps it is none of that bullshit and it came out quite clearly that there where zero benefits.
One of the reasons I’m seriously thinking about going back to HTML 4 is because HTML 5 is really exciting and it seems like it’ll be out and be supported by browsers way before XHTML 2 is. (Whether IE will support it is a different matter; to be honest I wouldn’t be surprised if it decided to support XHTML 2 at the same time as Web Forms 2) but at least HTML 5 is more backwards-compatible than XHTML 2 is.
What’s the real difference between valid XHTML and good HTML anyway? A few /> to close tags and making sure entities is encoded correctly?
For the moment I think I’ll keep using XHTML. TinyMCE produces XHTML. The XHTML might not always be perfect but it’s XHTML, not HTML. To return to HTML would mean modifying TinyMCE to produce valid HTML – without the self closing tags for example. The same thing applies to other scripts I use such as SafeHTML and PHP itself. A PHP error message is technically invalid HTML as it has a self-closing br tag.
Hopefully switching to XHTML 1.0 Strict with application/xhtml+xml will become more practical soon. And whats to stop someone from implementing HTML 5 in an XML format? XHTML 1.2 perhaps?