I'm trying to come to terms with Haskell's XML Toolbox (HXT) and I'm hitting a wall somewhere, because I don't seem to fully grasp arrows as a computational tool.
Here's my problem, which I hoped to illustrate a little better using a GHCi session:
> let parse p = runLA (xread >>> p) "<root><a>foo</a><b>bar</b><c>baz</c></root>"
> :t parse
parse :: LA XmlTree b -> [b]
So Parse is a small helper function that applies whatever arrow I give it to the trivial XML document
<root>
<a>foo</a>
<b>bar</b>
<c>baz</c>
</root>
I define another helper function, this time to extract the text below a node with a given name:
> let extract s = getChildren >>> isElem >>> hasName s >>> getChildren >>> getText
> :t extract
extract :: (ArrowXml cat) =>
String -> cat (Data.Tree.NTree.TypeDefs.NTree XNode) String
> parse (extract "a" &&& extract "b") -- extract two nodes' content.
[("foo","bar")]
With the help of this function, it's easy to use the &&&
combinator to pair up the text of two different nodes, and then, say, pass it to a constructor, like this:
> parse (extract "a" &&& extract "b" >>^ arr (\(a,b) -> (b,a)))
[("bar","foo")]
Now comes the part I don't understand: I want to left-factor! extract
calls getChildren
on the root-node twice. Instead, I'd like it to only call it once! So I first get the child of the root node
> let extract' s = hasName s >>> getChildren >>> getText
> :t extract'
extract' :: (ArrowXml cat) => String -> cat XmlTree String
> parse (getChildren >>> isElem >>> (extract' "a" &&& extract' "b"))
[]
Note, that I've tried to re-order the calls to, say, isElem, etc. in order to find out if that's the issue. But as it stands, I just don't have any idea why this isn't working. There is an arrow 'tutorial' on the Haskell wiki and the way I understood it, it should be possible to do what I want to do that way — namely use &&&
in order to pair up the results of two computations.
It does work, too — but only at the start of the arrow-chain, not mid-way trough, when I have some results already, that I want to keep 'shared.' I have the feeling that I'm just not being able to wrap my head around a difference in ideas between normal function composition and arrow notation. I'd be very appreciative of any pointers! (Even if it is just to some generic arrow-tutorial that goes a little more in-depth than the on the Haskell-wiki.)
Thank you!
If you convert the arrow to (and then from) a deterministic version this works as expected:
> let extract' s = unlistA >>> hasName s >>> getChildren >>> getText
> parse (listA (getChildren >>> isElem) >>> (extract' "a" &&& extract' "b"))
[("foo","bar")]
This isn't really satisfactory, though, and I can't remember off the top of my head why (&&&)
behaves this way with a nondeterministic arrow (I'd personally use the proc/do
notation for anything much more complicated than this).
UPDATE: There seems to be something weird going on here with runLA
and xread
. If you use runX
and readString
everything works as expected:
> let xml = "<root><a>foo</a><b>bar</b><c>baz</c></root>"
> let parse p = runX (readString [] xml >>> p)
> let extract' s = getChildren >>> hasName s >>> getChildren >>> getText
> parse (getChildren >>> isElem >>> (extract' "a" &&& extract' "b"))
[("foo","bar")]
This means you have to run the parser in the IO
monad, but there are advantages to using runX
anyway (better error messages, etc.).