xmlxelement

Workaround for "undeclared prefix" error on XElement.Load()


I'm pulling the source of a website. I then want to extract a specific part of it. My intention is to do this with LINQ-to-XML.

However, I get errors when I parse the source:

XElement source = XElement.Load(reader);

The problem seems to be references to namespaces I don't have. I get the error: 'addthis' is an undeclared prefix. Line 130, position 51. due to this line:

<div class="addthis_toolbox addthis_pill_combo" addthis:url="http://www.foo.com/foo">

And if I delete that one, other occur.

Thing is, I only care about one piece of this XML file - I don't need to be able to parse the whole file. I just want it in an XElement so I can find that one piece of it. Is there a way for me to hack around the parsing error? And I need a generic solution - I want to parse the file regardless of ANY undeclared prefix errors.

Thanks


Solution

  • This XML is not valid.

    In order to use a namespace prefix (such as addthis:), the namespace must be declared, by writing xmlns:addthis="some URI".

    In general, you shouldn't parse HTML using an XML parser, since HTML is likely to be invalid XML, for this reason and a number of other reasons (undeclared entities, unescaped JS, unclosed tags).
    Instead, use HTML Agility Pack.