SgmlReader is an XmlReader API over any SGML document (including built in support for HTML). A command line utility is also provided which outputs the well formed XML result.
Download the zip file including the standalone executable and the full source code: SgmlReader.zip
The relationships among HTML, XML and XHTML are an area of considerable confusion on the web. We often see questions on the webkit-dev mailing list where people wonder why their seemingly XHTML documents result in HTML output. Or we’re asked why an XML construct like <b /> doesn’t actually close the bold tag.
This article will attempt to clear up some of that confusion.