Is there some "easy" way (e.g. something I am missing in Attoparsec or some other library) to convert a defined Attoparsec parser that parses from ByteString to the one that parses from Text?
For example I have:
import Data.Attoparsec.ByteString.Char8
myTypeByteStringParser :: Parser MyType
What's the way to transform it into:
import Data.Attoparsec.Text
myTypeTextParser :: Parser MyType
It does look like contramap
(from hoogling type signature) but it is probably not possible to define Contravariant for Parser?
This is possible in general and you don't need to fork attoparsec
. Inconsiderately attoparsec
doesn't expose enough of its internals, but don't let that stop us:
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE QuasiQuotes #-}
module Parsers where
import qualified Data.Attoparsec.ByteString as AB
import qualified Data.Attoparsec.Internal.Types as AIT
import qualified Data.Attoparsec.Text as AT
import Data.ByteString (ByteString)
import qualified Data.ByteString.Internal as BI
import Data.Text (Text)
import Data.Text.Encoding (decodeUtf8, encodeUtf8)
import qualified Data.Text.Internal as TI
import Unsafe.TrueName
bsToTextState :: AIT.State ByteString -> AIT.State Text
bsToTextState = bufferText . decodeUtf8 . unbufferBS where
unbufferBS :: AIT.State ByteString -> ByteString
unbufferBS [truename| ''AIT.State
Data.Attoparsec.ByteString.Buffer.Buffer
Buf | fp off len _ _ |] = BI.PS fp off len
bufferText :: Text -> AIT.State Text
bufferText (TI.Text arr off len) = [truename| ''AIT.State
Data.Attoparsec.Text.Buffer.Buffer
Buf |] arr off len len 0
textToBSState :: AIT.State Text -> AIT.State ByteString
textToBSState = bufferBS . encodeUtf8 . unbufferText where
unbufferText :: AIT.State Text -> Text
unbufferText [truename| ''AIT.State
Data.Attoparsec.Text.Buffer.Buffer
Buf | arr off len _ _ |] = TI.Text arr off len
bufferBS :: ByteString -> AIT.State ByteString
bufferBS (BI.PS fp off len) = [truename| ''AIT.State
Data.Attoparsec.ByteString.Buffer.Buffer
Buf |] fp off len len 0
mapIResult :: (i -> j) -> (j -> i) -> AIT.IResult i a -> AIT.IResult j a
mapIResult f g = go where
go = \case
AIT.Fail i ctx msg -> AIT.Fail (f i) ctx msg
AIT.Partial k -> AIT.Partial (go . k . g)
AIT.Done i r -> AIT.Done (f i) r
mapFailure :: (i -> j) -> (j -> i) -> (AIT.State j -> AIT.State i) ->
AIT.Failure i (AIT.State i) r -> AIT.Failure j (AIT.State j) r
mapFailure f g h k st p m ctx msg = mapIResult f g $ k (h st) p m ctx msg
mapSuccess :: (i -> j) -> (j -> i) -> (AIT.State j -> AIT.State i) ->
AIT.Success i (AIT.State i) a r -> AIT.Success j (AIT.State j) a r
mapSuccess f g h k st p m a = mapIResult f g $ k (h st) p m a
bsToTextParser :: AB.Parser a -> AT.Parser a
bsToTextParser (AIT.Parser bsP) = AIT.Parser textP where
textP st p m f s = mapIResult decodeUtf8 encodeUtf8 $ bsP
(textToBSState st) p m
(mapFailure encodeUtf8 decodeUtf8 bsToTextState f)
(mapSuccess encodeUtf8 decodeUtf8 bsToTextState s)
textToBSParser :: AT.Parser a -> AB.Parser a
textToBSParser (AIT.Parser textP) = AIT.Parser bsP where
bsP st p m f s = mapIResult encodeUtf8 decodeUtf8 $ textP
(bsToTextState st) p m
(mapFailure decodeUtf8 encodeUtf8 textToBSState f)
(mapSuccess decodeUtf8 encodeUtf8 textToBSState s)
{,un}buffer{BS,Text}
are adapted from the respective internal modules Data.Attoparsec.{ByteString,Text}.Buffer
.
Was a good excuse for me to update true-name
to work with more recent GHC though. Depending on how up-to-date you are, you may need the WIP from GitHub.
It's probably not terrible for performance, as long as you keep in mind that each time you use textToBSParser
, the entire input gets fed through encodeUtf8
with any leftover converted back via decodeUtf8
, and vice versa for bsToTextParser
. If you only convert a Parser
once at the top-level, it shouldn't be too different from simply converting the input as the other answer suggests.
PS: I haven't tested this beyond
$ ghci -XOverloadedStrings parsers.hs
*Parsers> textToBSParser AT.scientific `AB.parseTest` "123 "
Done " " 123.0
PPS: for your own parsers, you might be able to leverage OverloadedStrings
and write p :: IsString s => AIT.Parser s a
instead, with {-# SPECIALISE p :: AT.Parser a #-}
pragmas. I've not explored how workable this idea is.