Currently I have a parser:
pScientific :: Parser Scientific
pScientific = lexeme L.scientific
This is able to easily parse something like 4087.00
but fails when then number 4,087.00
Is there a way to make megaparsec parse number with comma?
PS: I am very new to haskell, so apologize if this is a stupid question
The reason this is not parsed is because the scientific
type is mainly defined for JSON parsing, and JSON does not allow this, and a comma is used to separate elements in arrays and objects.
We can take a look at the implementation of scientific
[src]:
-- | Parse a JSON number. scientific :: Parser Scientific scientific = do sign <- A.peekWord8' let !positive = not (sign == W8_MINUS) when (sign == W8_PLUS || sign == W8_MINUS) $ void A.anyWord8 n <- decimal0 let f fracDigits = SP (B.foldl' step n fracDigits) (negate $ B.length fracDigits) step a w = a * 10 + fromIntegral (w - W8_0) dotty <- A.peekWord8 SP c e <- case dotty of Just W8_DOT -> A.anyWord8 *> (f <$> A.takeWhile1 isDigit_w8) _ -> pure (SP n 0) let !signedCoeff | positive = c | otherwise = -c (A.satisfy (\ex -> case ex of W8_e -> True; W8_E -> True; _ -> False) *> fmap (Sci.scientific signedCoeff . (e +)) (signed decimal)) <|> return (Sci.scientific signedCoeff e) {-# INLINE scientific #-}
The main thing to change is the decimal0
part, that captures a sequence of zero or more decimal numbers. We can for example implement this with:
import qualified Data.ByteString as B
decimal0' :: Parser Integer
decimal0' = do
digits <- B.filter (\x -> x /= 44) <$> A.takeWhile1 (\x -> isDigit_w8 x || x == 44)
if B.length digits > 1 && B.unsafeHead digits == 48
then fail "leading zero"
else return (bsToInteger digits)
and then use that one with:
import qualified Data.Attoparsec.ByteString as A
import qualified Data.Scientific as Sci
import Data.Attoparsec.ByteString.Char8 (isDigit_w8)
-- | Parse a JSON number.
scientific :: Parser Scientific
scientific = do
sign <- A.peekWord8'
let !positive = not (sign == 45)
when (sign == 43 || sign == 45) $
void A.anyWord8
n <- decimal0'
let f fracDigits = SP (B.foldl' step n fracDigits)
(negate $ B.length fracDigits)
step a w = a * 10 + fromIntegral (w - W8_0)
dotty <- A.peekWord8
SP c e <- case dotty of
Just 46 -> A.anyWord8 *> (f <$> A.takeWhile1 isDigit_w8)
_ -> pure (SP n 0)
let !signedCoeff | positive = c
| otherwise = -c
(A.satisfy (\ex -> case ex of W8_e -> True; W8_E -> True; _ -> False) *>
fmap (Sci.scientific signedCoeff . (e +)) (signed decimal)) <|>
return (Sci.scientific signedCoeff e)
{-# INLINE scientific' #-}
This does not take into account that the comma is placed after every three digits, so that will require extra logic, but this is a basic implementation to work accept commas in the integral part of the Scientific
.