I’m trying to parse some XML files with Haskell. For this job I’m using HXT to get some knowledge about arrows in real world applications. So I’m quite new to the arrow topics.
In XPath (and HaXml) it’s possible to select a node by position, let’s say: /root/a[2]/b
I can’t figure out how to do something like that with HXT, even after reading the documentation again and again.
Here is some sample code I’m working with:
module Main where
import Text.XML.HXT.Core
testXml :: String
testXml = unlines
[ "<?xml version=\"1.0\"?>"
, "<root>"
, " <a>"
, " <b>first element</b>"
, " <b>second element</b>"
, " </a>"
, " <a>"
, " <b>third element</b>"
, " </a>"
, " <a>"
, " <b>fourth element</b>"
, " <b>enough...</b>"
, " </a>"
, "</root>"
]
selector :: ArrowXml a => a XmlTree String
selector = getChildren /> isElem >>> hasName "a" -- how to select second <a>?
/> isElem >>> hasName "b"
/> getText
main :: IO ()
main = do
let doc = readString [] testXml
nodes <- runX $ doc >>> selector
mapM_ putStrLn nodes
The desired output would be:
third element
Thanks in advance!
The solution which I believe selects "/root/a[2]/b" (all "b" tags inside second "a" tag):
selector :: ArrowXml a => Int -> a XmlTree String
selector nth =
(getChildren /> isElem >>> hasName "a") -- the parentheses required!
>. (!! nth)
/> isElem >>> hasName "b" /> getText
(result is ["third element"]
).
Explanation: As I see, class (..., ArrowList a, ...) => ArrowXml a
, so ArrowXml a
is a subclass for ArrowList
. Looking through ArrowList
interface:
(>>.) :: a b c -> ([c] -> [d]) -> a b d
(>.) :: a b c -> ([c] -> d) -> a b d
so >>.
can select a subset of a list using some lifted [c] -> [d]
and >.
can select a single item from a list using a lifted function of type [c] -> d
. So, after children are selected and tags "a" filtered, let's use (!! nth) :: [a] -> a
.
There's an important thing to note:
infix 1 >>>
infix 5 />
infix 8 >.
(so I've had a hard time trying to figure out why >.
without parentheses does not work as expected). Thus, getChildren /> isElem >>> hasName "a"
must be wrapped in parentheses.