I recently followed through A Taste of Curry, and afterwards decided to put the trivial arithmetic parser example to test, by writing a somewhat more substantial parser: a primitive but correct and functional HTML parser.
I ended up with a working node2string
function to operate on Node
(with attributes and children), which I then inverse
d to obtain a parse
function, as exemplified in the article.
The first naive implementation had the mistake that it parsed anything but e.g. the trivial <input/>
HTML snippet into exactly one Node
representation; everything else nondeterministically yielded invalid things like
Node { name = "input", attrs = [Attr "type" "submit"] }
Node { name = "input type=\"submit\"", attrs = [] }
and so on.
After some initial naive attempts to fix that from within node2string
, I realized the point, which I believe all seasoned logic programmers see instantaneously, that parse = inverse node2string
was right more right and insightful about the sitatution than I was: the above 2 parse results of <input type="submit"/>
indeed were exactly the 2 valid and constructible values of Node
that would lead to HTML representations.
I realized I had to constrain Node
to only allow passing in alphabetic — well not really but let's keep it simple — names (and of course same for Attr
). In a less fundamental setting than a logic program (such as regular Haskell with much more hand written and "instructional" as opposed to purely declarative programming), I would simply have hidden the Node
constructor behind e.g. a mkNode
sentinel function, but I have the feeling this wouldn't work well in Curry due to how the inference engine or constraint solver work (I might be wrong on this, and in fact I hope I am).
So I ended up instead with the following. I think Curry metaprogramming (or Template Haskell, if Curry supported it) could be used ot clean up the manual boielrplate, but cosmetically dealing is only one way out of the situation.
data Name = Name [NameChar] -- newtype crashes the compiler
data NameChar = A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
name2char :: NameChar -> Char
name2char c = case c of A -> 'a'; B -> 'b'; C -> 'c'; D -> 'd'; E -> 'e'; F -> 'f'; G -> 'g'; H -> 'h'; I -> 'i'; J -> 'j'; K -> 'k'; L -> 'l'; M -> 'm'; N -> 'n'; O -> 'o'; P -> 'p'; Q -> 'q'; R -> 'r'; S -> 's'; T -> 't'; U -> 'u'; V -> 'v'; W -> 'w'; X -> 'x'; Y -> 'y'; Z -> 'z'
name2string :: Name -> String
name2string (Name s) = map name2char s
-- for "string literal" support
nameFromString :: String -> Name
nameFromString = inverse name2string
data Node = Node { nodeName :: Name, attrs :: [Attr], children :: [Node] }
data Attr = Attr { attrName :: Name, value :: String }
attr2string :: Attr -> String
attr2string (Attr name value) = name2string name ++ "=\"" ++ escape value ++ "\""
where escape = concatMap (\c -> if c == '"' then "\\\"" else [c])
node2string :: Node -> String
node2string (Node name attrs children) | null children = "<" ++ name' ++ attrs' ++ "/>"
| otherwise = "<" ++ name' ++ attrs' ++ ">" ++ children' ++ "</" ++ name' ++ ">"
where name' = name2string name
attrs' = (concatMap ((" " ++) . attr2string) attrs)
children' = intercalate "" $ map (node2string) children
inverse :: (a -> b) -> (b -> a)
inverse f y | f x =:= y = x where x free
parse :: String -> Node
parse = inverse node2string
This, in fact, works perfectly (in my judgement):
Parser> parse "<input type=\"submit\"/>"
(Node [I,N,P,U,T] [(Attr [T,Y,P,E] "submit")] [])
Parser> parse "<input type=\"submit\" name=\"btn1\"/>"
(Node [I,N,P,U,T] [(Attr [T,Y,P,E] "submit"),(Attr [N,A,M,E] "btn1")] [])
(Curry doesn't have type classes so I wouldn't know yet how to make [NameChar]
print more nicely)
However, my question is:
is there a way to use something like isAlpha
(or a function more true to the actual HTML spec, of course) to achieve a result equivalent to this, without having to go through the verbose boilerplate that NameChar
and its "supporting members" are? There seems to be no way to even place the "functional restriction" anywhere within the ADT.
In a Dependently Typed Functional Logic Programming language, I would just express the constraint at the type level and let the inference engine or constraint solver deal with it, but here I seem to be at a loss.
You can achieve the same results using just Char
. As you've already pointed out, you can use isAlpha
to define name2char
as a partial identity. I changed the following lines of your code.
type NameChar = Char
name2char :: NameChar -> Char
name2char c | isAlpha c = c
The two exemplary expressions then evaluate as follows.
test> parse "<input type=\"submit\" name=\"btn1\"/>"
(Node (Name "input") [(Attr (Name "type") "submit"),(Attr (Name "name") "btn1")] [])
test> parse "<input type=\"submit\"/>"
(Node (Name "input") [(Attr (Name "type") "submit")] [])
As a side-effect, names with non-alpha characters silently fail with nameFromString
.
test> nameFromString "input "
Edit: Since you seem to be a fan of function patterns, you can define generators for Node
s and Attr
s and use them in your conversion function.
attr :: Name -> String -> Attr
attr name val
| name `elem` ["type", "src", "alt", "name"] = Attr name val
node :: String -> [Attr] -> [Node] -> Node
node name [] nodes
| name `elem` ["a", "p"] = Node name [] nodes
node name attrPairs@(_:_) nodes
| name `elem` ["img", "input"] = Node name attrPairs nodes
node2string :: Node -> String
node2string (node name attrs children)
| null children = "<" ++ name ++ attrs' ++ "/>"
| otherwise = "<" ++ name ++ attrs' ++ ">"
++ children' ++ "</" ++ name' ++ ">"
where
name' = name
attrs' = concatMap ((" " ++) . attr2string) attrs
children' = intercalate "" $ map (node2string) children
attr2string :: Attr -> String
attr2string (attr name val) = name ++ "=\"" ++ escape val ++ "\""
where
escape = concatMap (\c -> if c == '"' then "\\\"" else [c])
This approach has its disadvantages; it works quite well for a specific set of valid names, but fails miserably when you use a predicate like before (e.g., all isAlpha name
).
Edit2:
Besides the fact that the solution with the isAlpha
condition is quite "prettier" than your verbose solution, it is also defined in a declarative way.
Without your comments, it doesn't become clear (that easily) that you are encoding alphabetic characters with your NameChar
data type. The isAlpha
condition on the other hand is a good example for a declarative specification of the wanted property.
Does this answer your question? I'm not sure what you are aiming at.