I'm trying to convert HTML stream to XML using SgmlReader for further parsing. This conversion is part of an APP i'm developing for Windows 8 Store. Below is the method that convert Html to XML:-
public static void ConvertToXml(string webResponse)
{
StringWriter sWriter = new StringWriter();
XmlWriter xmlWriter = XmlWriter.Create(sWriter);
SgmlReader sgmlReader = new SgmlReader();
sgmlReader.DocType = "HTML";
sgmlReader.WhitespaceHandling = WhitespaceHandling.All;
sgmlReader.CaseFolding = CaseFolding.ToLower;
sgmlReader.InputStream = new StringReader(webResponse);
sgmlReader.IgnoreDtd = true;
while (!sgmlReader.EOF)
{
xmlWriter.WriteNode(sgmlReader, true);
}
xmlWriter.Flush();
XmlString = sWriter.ToString();
}
The sgmlReader.WhitespaceHandling = WhitespaceHandling.All; is the problem as Xml.WhitespaceHandling is not present. Is there anyother way to do this?
After alot of reading and testing/debugging just found that sgmlReader.WhitespaceHandling = WhitespaceHandling.All is not needed atleast in my case, as sgmlReader.WhitespaceHandling is set to All by default. However i removed sgmlReader.IgnoreDtd = true; and now my Xml file look Normal ;)
Hope this will help someone