I'm hacking a thing that should copy an XML file and edit little part of it. Editing now is OK, but interestingly enough copying may be quite tricky. This is essentially «reverse engineering» work and now I know that I should somehow preserve closing tags of some elements (even if the elements contain only white space or are empty). The problem is when HXT reads something like
<tag>
</tag>
it then prints it as
<tag/>
I can tell it to always use explicit closing tag (or whatever you call it)
specifying withOutputXHTML
option for writeDocument
function, however
there are elements that are written as
<tag/>
that should be copied «as is».
So, essentially my problem boils down to: «How to copy this file preserving closing tags of some specific elements?»:
<foo>
<bar>
</bar>
<baz/>
</foo>
Simple copying program for reference/experiment:
module Main (main) where
import Control.Monad (void)
import Text.XML.HXT.Core
main :: IO ()
main = void $ runX $
readDocument [ withValidate no ] "test.xml" >>>
writeDocument [ withIndent yes
, withOutputEncoding isoLatin1
, withOutputXHTML ] "result.xml"
After long, frustrating searching, I've decided to try every option in Text.XML.HXT.Arrow.XmlState. Some options just have no doc strings, so it's a guessing game.
Finally, I've found this marvel:
withNoEmptyElemFor :: [String] -> SysConfig
Although it has no doc string, its name sounds quite promising. Indeed, with help of this option we can specify names of elements which «cannot be empty».
This option can be used with
writeDocument
or
configSysVars
.
I like the second arrow better because I can use it locally, it's useful if
you have several arrows that perform processing of slightly different
documents that may have different collections of tags that shouldn't be
empty (that's my case).
So, returning to my example, we can fix it by writing:
module Main (main) where
import Control.Monad (void)
import Text.XML.HXT.Core
main :: IO ()
main = void $ runX $
readDocument [ withValidate no ] "test.xml" >>>
writeDocument [ withIndent yes
, withOutputEncoding isoLatin1
, withNoEmptyElemFor ["bar"] ] "result.xml"