Consider the following code:
require 'nokogiri' # v1.5.2
doc = Nokogiri.XML('<body><a name="foo">ick</a></body>')
puts doc.to_html
#=> <body><a name="foo">ick</a></body>
puts doc.to_xml
#=> <?xml version="1.0"?>
#=> <body>
#=> <a name="foo">ick</a>
#=> </body>
puts doc.to_xhtml
#=> <body>
#=> <a name="foo" id="foo">ick</a>
#=> </body>
Notice the new id
attribute that has been created.
id
and name
attribute with the same value.to_xhtml
method on input that may have <a name="foo">
?This problem arises because I have some input I am parsing with an id
attribute on one element and a separate element with a name
attribute that happens to conflict.
Apparently it's a feature of libxml2. In http://www.w3.org/TR/xhtml1/#h-4.10 we find:
In XML, fragment identifiers are of type
ID
, and there can only be a single attribute of typeID
per element. Therefore, in XHTML 1.0 theid
attribute is defined to be of typeID
. In order to ensure that XHTML 1.0 documents are well-structured XML documents, XHTML 1.0 documents MUST use theid
attribute when defining fragment identifiers on the elements listed above.
[...]
Note that in XHTML 1.0, thename
attribute of these elements is formally deprecated, and will be removed in a subsequent version of XHTML.
The best 'workaround' I've come up with is:
# Destroy all <a name="..."> elements, replacing with children
# if another element with a conflicting id already exists in the document
doc.xpath('//a[@name][not(@id)][not(@href)]').each do |a|
a.replace(a.children) if doc.at_css("##{a['name']}")
end