unit-testinghaskellquickcheck

How to write a Quickcheck property for ANSI escaped coded string parser?


Please consider the following piece of code:

-- Represents a parsing result of an ANSI coded string.
data Slice = Slice
  { text :: String,
    color :: Color
  }

newtype Color = Color
  { string :: String
  }

-- A function that receives a string with ANSI esacpe codes and returns a list of slices.
categorize:: String -> [Slice]
categorize codedString = ...

Now, I wish to write a quickcheck property for the categorize function. I have something like this in mind:

-- A quickcheck generator for ANSI coded strings.
ansiEscapeStrings :: Gen String
ansiEscapeStrings = ...

main =
  verboseCheck $
    forAll
      ansiEscapeStrings
      (\codedString -> categorize codedString == WHAT_GOES_HERE)

My question is what goes instead of WHAT_GOES_HERE?

Thanks in advance.

UPDATE:

I already wrote properties for trivial things like length and empty list.


Solution

  • With QuickCheck, you should identify some rules you think should hold for all possible input/output pairs. It's rather difficult to do this if your concept of input is just "an arbitrary string to pass to the function" and the output is "the slices the function returns". At this level of abstraction, the only rule you can really write is "the function should produce the slices represented by the input string" - of course, you can't test that directly, because if you had a known-good conversion from input to output you'd just use that instead.

    Instead, try some thinking at a more granular level. What are some ways you think this function should behave, given certain properties of the input? Here are some I can think of.

    1. An empty input should produce an empty output.
    2. The sum of the lengths of the text strings in an output should be no larger than the size of the input.
    3. All the characters present in one of the text strings in an output should be present somewhere in the input.
    4. If the input includes n color-change sequences, there should be n slices in the output. Or is it n-1 slices? How are you planning to represent "text preceding any color-change sequences"? And what about if there are two color-change sequences in a row, with no text between them? Already, thinking in terms of properties to test has us finding important edge cases in the design.

    These properties need varying levels of precision to test them. For (1), you don't even need QuickCheck: you can just unit-test the single, empty input. For (2) and (3), you could just pass an arbitrary string as input and compare the input and output data. But QuickCheck's arbitrary string generator likely won't produce many strings with meaningful ANSI escape sequences in them. So you will probably want to define stringWithEscapeSequences :: Gen String or something like that, to ensure your test inputs are interesting.

    But (4) is more complicated. You could take an arbitrary string as input, scan it for escape sequences, and then compare that to the number of slices produced by categorize. But that requires including a lot of your function's implementation into the test, and a bug in your function could easily be mirrored by a bug in your test. A less fragile approach would be to write a data type for test cases, with a generator for that type, so that you know how many slices to expect from each test case ahead of time. Something like

    data SliceCountTestCase = SliceCountTestCase 
      { numSlices :: Int, input :: String }
      deriving (Show, Eq)
    
    instance Arbitrary SliceCountTestCase where
      arbitrary = do
        slices <- listOf slice
        pure $ SliceCountTestCase (length slices) (serialize =<< slices)
        where slice = (,) <$> color <*> listOf nonEscapeCharacter
              serialize (c, body) = escapeSequence c ++ body
    
    prop_sliceCountMatches :: SliceCountTestCase -> Bool
    prop_sliceCountMatches (SliceCountTestCase n s) = length (categorize s) == n
    

    There are numerous spots in that example for you to fill in based on your domain knowledge: color and nonEscapeCharacter are all generators, while escapeSequence is an ordinary function of type Color -> String.

    You could design another, similar property using many of the same combinators: given a list of slices as input, you should be able to encode it as a string, and categorize on that string should give you back the same result you started with


    Here are some ideas for other properties you could test, without details on how to test them. Try to think small to come up with properties: you need something you can easily describe and verify.

    1. A non-empty input should generate a non-empty list of slices. (Or should it? What if the only input is an escape sequence?)
    2. The number of non-escape characters in the input should be the same as the sum of lenghts of the output slices
    3. Suppose that a string s decodes to p. Choose an index n in [0..length s], and insert an arbitrary non-escape character x at position n in s yielding s'. Decoding s' should yield a list of slices much like p, except with a single extra x at the nth position.
    4. Do something like (3), but insert an escape sequence instead and expect a new slice (Or maybe not, depending on how you're handling adjacent escape sequences).

    In general, a fruitful avenue to explore is the theme in (3): Take an input and its corresponding output, perturb the input in some well-defined way, and observe that this changes the output in the expected way.