haskellhxtarrow-abstraction

HXT: Select a node by position with HXT in Haskell?


I’m trying to parse some XML files with Haskell. For this job I’m using HXT to get some knowledge about arrows in real world applications. So I’m quite new to the arrow topics.

In XPath (and HaXml) it’s possible to select a node by position, let’s say: /root/a[2]/b

I can’t figure out how to do something like that with HXT, even after reading the documentation again and again.

Here is some sample code I’m working with:

module Main where

import Text.XML.HXT.Core

testXml :: String
testXml = unlines
    [ "<?xml version=\"1.0\"?>"
    , "<root>"
    , "    <a>"
    , "        <b>first element</b>"
    , "        <b>second element</b>"
    , "    </a>"
    , "    <a>"
    , "        <b>third element</b>"
    , "    </a>"
    , "    <a>"
    , "        <b>fourth element</b>"
    , "        <b>enough...</b>"
    , "    </a>"
    , "</root>"
    ]

selector :: ArrowXml a => a XmlTree String
selector = getChildren /> isElem >>> hasName "a" -- how to select second <a>?
                       /> isElem >>> hasName "b"
                       /> getText

main :: IO ()
main = do
    let doc = readString [] testXml
    nodes <- runX $ doc >>> selector
    mapM_ putStrLn nodes

The desired output would be:

third element

Thanks in advance!


Solution

  • The solution which I believe selects "/root/a[2]/b" (all "b" tags inside second "a" tag):

    selector :: ArrowXml a => Int -> a XmlTree String
    selector nth =
        (getChildren /> isElem >>> hasName "a")   -- the parentheses required!
        >. (!! nth) 
        /> isElem >>> hasName "b" /> getText
    

    (result is ["third element"]).

    Explanation: As I see, class (..., ArrowList a, ...) => ArrowXml a, so ArrowXml a is a subclass for ArrowList. Looking through ArrowList interface:

    (>>.) :: a b c -> ([c] -> [d]) -> a b d
    (>.) :: a b c -> ([c] -> d) -> a b d
    

    so >>. can select a subset of a list using some lifted [c] -> [d] and >. can select a single item from a list using a lifted function of type [c] -> d. So, after children are selected and tags "a" filtered, let's use (!! nth) :: [a] -> a.

    There's an important thing to note:

    infix 1 >>>
    infix 5 />
    infix 8 >.
    

    (so I've had a hard time trying to figure out why >. without parentheses does not work as expected). Thus, getChildren /> isElem >>> hasName "a" must be wrapped in parentheses.