First of all I would like to let you know that I'm fairly new to Haskell and I'm trying to understand how parsers work in haskell . So I'm basically trying to parse this e-book from http://www.gutenberg.org/files/57071/57071-0.txt and analyze the text. Like output the number of english words, sentences and paragraphs and such. Here's my code:
{-# LANGUAGE OverloadedStrings #-}
import Control.Exception (catch, SomeException)
import System.Environment (getArgs)
import Data.Attoparsec.Text
import Data.Char
import Control.Applicative ((<*>), (*>), (<$>), (<|>), pure)
data Prose = Prose {
word :: String
} deriving Show
prose :: Parser Prose
prose = do
word <- many' $ satisfy isAlphaNum
return $ Prose word
main :: IO()
main = do
input <- readFile "small.txt"
print $ parse prose input
This is my error message:
I have used "OverloadedStrings" to try and fix this issue, but it doesnt seem to work. Also any guidance on examples or tutorials to get started with attoparsec would be greatly helpful!
-XOverloadedStrings
only changes the type of string literals from String
to the more general IsString a => a
(which can be unified with String
, Text
, ByteString
and more). In your code, there's just one literal: the file name "small.txt"
.
But file names are always String
anyway! Well, FilePath
, but that's just a synonym for String
. (Even the Data.Text.IO
functions take filenames as such plain-old-list strings.) So the overloaded string literal actually makes no difference here at all.
But the parser does not process file names but file contents, so what you need to do is use IO routines that obtain this content as Text
.
import qualified Data.Text.IO as Txt
main :: IO()
main = do
input <- Txt.readFile "small.txt"
print $ parse prose input