I have an input string in html that needs to be parsed and written to DITA compatible XML.
Input:
<p>Line with following newline<br>Line with two following newlines<br><br>Line with no following newline</p>
Desired Output:
<p>Line with following newline<?linebreak?>Line with two following newlines<?linebreak?><?linebreak?>Line with no following newline</p>
package require tdom
set xml {<p>Line with following newline<br>Line with two following newlines<br><br>Line with no following newline</p>}
puts "Input:"
puts "$xml"
set doc [dom parse -html -keepEmpties $xml]
set root [$doc documentElement]
foreach node [$root getElementsByTagName br] {
$node delete
#$node appendXML "<?linebreak?>"
}
puts "Output:"
puts [$doc asXML -indent none]
If I uncomment #$node appendXML "<?linebreak?>"
, the script fails. I'm new to tdom but not tcl. Or....maybe someone has a different idea on how to preserve linebreaks in XML, specifically DITA.
Once you call delete
on a tdom node, it no longer exists, so naturally you get an error if you then try to use it after.
One approach: For each br
node, create a new processing instruction node, and then replace the br
one with it (Which first requires getting the node's parent). Your loop would then look like:
foreach node [$root getElementsByTagName br] {
set lb [$doc createProcessingInstruction linebreak ""]
[$node parentNode] replaceChild $lb $node
# replaceChild moves the old node to the document fragment list;
# just get rid of it completely since we're not going to reuse it
$node delete
}
and the modified program prints out
Input:
<p>Line with following newline<br>Line with two following newlines<br><br>Line with no following newline</p>
Output:
<html><p>Line with following newline<?linebreak ?>Line with two following newlines<?linebreak ?><?linebreak ?>Line with no following newline</p></html>