I'm just trying to jump through a SOS_MT block in a .JPEG file, I don't want to use the data for anything, I just want to know where it ends. According to what I understand from JPEG's article in Wikipedia, while all other blocks in the JPEG file start with a few bytes that indicate the blocks's length, a SOS_MT block is ... well, an evil swamp that you have no option but to parse byte-by-byte until you get to the end of it.
So I came with the following code to do just that:
entropyCoded :: Parser Int
entropyCoded = do
list_of_lengths <- many' $
(
do
_ <- notWord8 0xFF
return 1
)
<|>
(
do
_ <- word8 0xFF
_ <- word8 0
return 2
)
<|>
(
do
l <- many1 (word8 0xFF)
_ <- satisfy (\x -> ( x >= 0xD0 && x < 0xD7 ))
return $ 1 + length l
)
<|>
(
do
_ <- word8 0xFF
maybe_ff <- peekWord8'
if maybe_ff == 0xFF
then
return 1
else
fail "notthere"
)
foldM (\ nn n -> nn `seq` return (nn + n) ) 0 list_of_lengths
This code uses Atoparsec and as far as I have had the chance to verify it, it is correct. It is just slow. Any tips on how to improve, performance-wise, this parser?
If you want to skip over an SOS market, just look for the next marker that is not a restart marker.
Read bytes until you find and FF. If the next value 00, it is a compressed FF value and skip over it. If it's a restart marker skip over it. Otherwise, FF should start the next block.