I have the following sample XML. if I feed it to libxml2
without any formatting or whitespace in between, then it would pretty-print fine when calling xmlNodeDump()
with 1
:
const char *xml= "<root><a>test</a></root>";
However, if I preformat it, or have spaces in between, then libxml2
refuses to pretty-print it, for example:
const char *xml =
"<root>"
" <a>"
" test"
" </a>"
"</root>";
Then I call the function to read it like this:
#define MY_PARSER_OPTIONS (XML_PARSE_RECOVER | XML_PARSE_NOENT | XML_PARSE_DTDLOAD | XML_PARSE_DTDATTR | XML_PARSE_HUGE)
...
doc = xmlReadDoc((const xmlChar *) xml, NULL, NULL, MY_PARSER_OPTIONS);
...
xmlNodePtr root = xmlDocGetRootElement(doc);
xmlBufferPtr buf = xmlBufferCreate();
xmlNodeDump(buf, doc, root, 0, 1);
The output would not be formatted.
When the input contains spaces, the output is:
<root> <a> test </a></root>
When the input contains no spaces, the output is:
<root>
<a>test</a>
</root>
Is this a bug in libxml2? How can I have it pretty-print/format correctly? There is no error from either of the input.
UPDATE:
With minimal reproducible example:
https://github.com/totszwai/libxml2-troubleshoot1
As we can see, when the input contains some spacing, libxml2 cannot format it, for some reason.
Try including the XML_PARSE_NOBLANKS
option when calling xmlReadDoc()
. Per the libxml2 documentation:
Remove some text nodes containing only whitespace from the result document. Which nodes are removed depends on DTD element declarations or a conservative heuristic. The reindenting feature of the serialization code relies on this option to be set when parsing.