haskellmegaparsec

How to properly parse indented block with megaparsec?


I'm trying to make an indentation based programming language, and I'm trying to parse something like:

expr1 :
  expr2
  expr3

Here, essentially : indicates the start of a new indentation block, so expr1 is completely irrelevant, the idea is that : can appear anywhere in the line, and must be the last token of the line.

I got this code that more or less works:

block :: Parser Value
block = dbg "block" $ do
  void $ symbol ":"
  void $ eol
  space1
  (L.indentBlock spaceConsumer indentedBlock)
  where
    indentedBlock = do
      e <- expr
      pure (L.IndentMany Nothing (\exprs -> pure $ Block () (e : exprs)) expr)

But the issue is that in the example, only the first expression of the block is parsed with the proper indentation, the others must be more indented, like this

expr1 :
  expr2
   expr3
   expr4
   expr5

Solution

  • I ended up parsing expr1 right in the same place as the :

    Apparently indentBlock starts counting from the column where the parser passed as the last parameter begins, so the idea is to begin parsing from the beginning of the line (relative to current indentation level), it ended up being like this:

    block :: Parser Value
    block =
      L.indentBlock spaceConsumer indentedBlock
      where
        indentedBlock = do
          caller <- callerExpression
          args <- parseApplicationArgs
          pure (L.IndentSome Nothing (exprsToAppBlock caller args) parse)
        exprsToAppBlock caller args exprs =
          pure (Application () caller (args <> [Block () exprs]))