I'm a Haskell beginner, using Attoparsec to find some color expressions in a text. I want to be able to match, for example, "light blue-green" and "light blue green" in a text. But of course I need a generalized solution for any string like that. So I've been thinking that it would be something like
"light" >> sep >> "blue" >> sep >> "green"
where sep = inClass "\n\r- "
In other words, I think I need a way to intercalate >> sep >>
to a list of words. Something like:
import qualified Data.Text as T
import Data.Attoparsec.Text
-- | Makes a parser from a list of words, accepting
-- spaces, newlines, and hyphens as separators.
wordListParser :: [T.Text] -> Parser
wordListParser wordList = -- Some magic here
Or maybe I'm thinking about this the wrong way entirely, and there's an easier way?
Edit: this minimal non-working example feels like it's almost there:
{-# LANGUAGE OverloadedStrings #-}
import Replace.Attoparsec.Text
import Data.Attoparsec.Text as AT
import qualified Data.Text as T
import Control.Applicative (empty)
wordListParser :: [T.Text] -> Parser T.Text
wordListParser (w:ws) = string w >> satisfy (inClass " -") >> wordListParser ws
wordListParser [w] = string w
wordListParser [] = empty -- or whatever the empty parser is
main :: IO ()
main = parseTest (wordListParser (T.words "light green blue")) "light green-blue"
which I think can be run with something like
stack runhaskell ThisFile.hs --package attoparsec replace-attoparsec text
Here is what I would do, assuming that you have a data type for your colours; if you don't, just substitute it for what you're using. The function parseColourGen
takes any Text
that is space-separated, and generates a parser that accepts a colour where each word is separated by one or more legal separators.
import Prelude hiding (concat, words)
import Control.Applicative ((<|>))
import Data.Attoparsec.Text
import Data.List (intersperse)
import Data.Text (concat, pack, singleton, Text, words)
data Colour = LightBlue | DarkBlue | VibrantRed deriving Show
parseColourGen :: Text -> Parser [Text]
parseColourGen = sequence . intersperse (mempty <$ many1 legalSep) .
fmap string . words
parseColour :: [(Text, Colour)] -> Parser Colour
parseColour = foldl1 (<|>) . fmap (\(text, colour) ->
colour <$ parseColourGen text)
legalSep :: Parser Text
legalSep = singleton <$> satisfy (inClass "\n\r- ")
You can then feed your wordList
to the parser; however, it needs to be an association list:
wordList :: [(Text, Colour)]
wordList = [("light blue", LightBlue), ("dark blue", DarkBlue), ("vibrant red", VibrantRed)]
This way, you can configure all of your colours and their corresponding colour names in one place, and you can then run the parser like so:
> parse (parseColour wordList) $ pack "vibrant-red"
Done "" VibrantRed
EDIT
After the edit in your question, I think I understand what you want a little bit better. FWIW, I would still prefer the solution above, but here is how to fix your last block of code:
(w:ws)
and [w]
overlap, so if you want the runtime to catch the single-element pattern, you have to place it on top.a >> b
means "run action a
, discard its result, then run action b
and use that result". Which is why your parser (with the fix above) will output Done "" "blue"
. A simple way to fix this is to use do
notation to bind the result of all three computations, and return their concatenation.Here is what your code now looks like:
wordListParser :: [Text] -> Parser Text
wordListParser [w] = string w
wordListParser (w:ws) = do
a <- string w
b <- satisfy (inClass " -")
c <- wordListParser ws
return (a `append` (singleton b) `append` c) -- singleton :: Char -> Text
wordListParser [] = empty
One last thing: your current implementation will not parse Windows line breaks (\n\r
). I don't know if you dropped \n
and \r
from your separator characters, but if you haven't and Windows files are a possibility for you, it's something to keep in mind.