Say I have the following free monad:
data ExampleF a
= Foo Int a
| Bar String (Int -> a)
deriving Functor
type Example = Free ExampleF -- this is the free monad want to discuss
I know how I can work with this monad, eg. I could write some nice helpers:
foo :: Int -> Example ()
foo i = liftF $ Foo i ()
bar :: String -> Example Int
bar s = liftF $ Bar s id
So I can write programs in haskell like:
fooThenBar :: Example Int
fooThenBar =
do
foo 10
bar "nice"
I know how to print it, interpret it, etc. But what about parsing it?
Would it be possible to write a parser that could parse arbitrary programs like:
foo 12
bar nice
foo 11
foo 42
So I can store them, serialize them, use them in cli programs etc.
The problem I keep running into is that the type of the program depends on which program is being parsed. If the program ends with a foo
it's of
type Example ()
if it ends with a bar
it's of type Example Int
.
I do not feel like writing parsers for every possible permutation (it's simple here because there are only two possibilities, but imagine we add
Baz Int (String -> a)
, Doo (Int -> a)
, Moz Int a
, Foz String a
, .... This get's tedious and error-prone).
Perhaps I'm solving the wrong problem?
To run the above examples, you need to add this to the beginning of the file:
{-# LANGUAGE DeriveFunctor #-}
import Control.Monad.Free
import Text.ParserCombinators.Parsec
Note: I put up a gist containing this code.
Not every Example
value can be represented on the page without reimplementing some portion of Haskell. For example, return putStrLn
has a type of Example (String -> IO ())
, but I don't think it makes sense to attempt to parse that sort of Example
value out of a file.
So let's restrict ourselves to parsing the examples you've given, which consist only of calls to foo
and bar
sequenced with >>
(that is, no variable bindings and no arbitrary computations)*. The Backus-Naur form for our grammar looks approximately like this:
<program> ::= "" | <expr> "\n" <program>
<expr> ::= "foo " <integer> | "bar " <string>
It's straightforward enough to parse our two types of expression...
type Parser = Parsec String ()
int :: Parser Int
int = fmap read (many1 digit)
parseFoo :: Parser (Example ())
parseFoo = string "foo " *> fmap foo int
parseBar :: Parser (Example Int)
parseBar = string "bar " *> fmap bar (many1 alphaNum)
... but how can we give a type to the composition of these two parsers?
parseExpr :: Parser (Example ???)
parseExpr = parseFoo <|> parseBar
parseFoo
and parseBar
have different types, so we can't compose them with <|> :: Alternative f => f a -> f a -> f a
. Moreover, there's no way to know ahead of time which type the program we're given will be: as you point out, the type of the parsed program depends on the value of the input string. "Types depending on values" is called dependent types; Haskell doesn't feature a proper dependent type system, but it comes close enough for us to have a stab at making this example work.
Let's start by forcing the expressions on either side of <|>
to have the same type. This involves erasing Example
's type parameter using existential quantification.†
data Ex a = forall i. Wrap (a i)
parseExpr :: Parser (Ex Example)
parseExpr = fmap Wrap parseFoo <|> fmap Wrap parseBar
This typechecks, but the parser now returns an Example
containing a value of an unknown type. A value of unknown type is of course useless - but we do know something about Example
's parameter: it must be either ()
or Int
because those are the return types of parseFoo
and parseBar
. Programming is about getting knowledge out of your brain and onto the page, so we're going to wrap up the Example
value with a bit of GADT evidence which, when unwrapped, will tell you whether a
was Int
or ()
.
data Ty a where
IntTy :: Ty Int
UnitTy :: Ty ()
data (a :*: b) i = a i :&: b i
type Sig a b = Ex (a :*: b)
pattern Sig x y = Wrap (x :&: y)
parseExpr :: Parser (Sig Ty Example)
parseExpr = fmap (\x -> Sig UnitTy x) parseFoo <|>
fmap (\x -> Sig IntTy x) parseBar
Ty
is (something like) a runtime "singleton" representative of Example
's type parameter. When you pattern match on IntTy
, you learn that a ~ Int
; when you pattern match on UnitTy
you learn that a ~ ()
. (Information can be made to flow the other way, from types to values, using classes.) :*:
, the functor product, pairs up two type constructors ensuring that their parameters are equal; thus, pattern matching on the Ty
tells you about its accompanying Example
.
Sig
is therefore called a dependent pair or sigma type - the type of the second component of the pair depends on the value of the first. This is a common technique: when you erase a type parameter by existential quantification, it usually pays to make it recoverable by bundling up a runtime representative of that parameter.
Note that this use of Sig
is equivalent to Either (Example Int) (Example ())
- a sigma type is a sum, after all - but this version scales better when you're summing over a large (or possibly infinite) set.
Now it's easy to build our expression parser into a program parser. We just have to repeatedly apply the expression parser, and then manipulate the dependent pairs in the list.
parseProgram :: Parser (Sig Ty Example)
parseProgram = fmap (foldr1 combine) $ parseExpr `sepBy1` (char '\n')
where combine (Sig _ val) (Sig ty acc) = Sig ty (val >> acc)
The code I've shown you is not exemplary. It doesn't separate the concerns of parsing and typechecking. In production code I would modularise this design by first parsing the data into an untyped syntax tree - a separate data type which doesn't enforce the typing invariant - then transform that into a typed version by type-checking it. The dependent pair technique would still be necessary to give a type to the output of the type-checker, but it wouldn't be tangled up in the parser.
*If binding is not a requirement, have you thought about using a free applicative to represent your data?
†Ex
and :*:
are reusable bits of machinery which I lifted from the Hasochism paper