unit-testinghaskellquickcheckhunit

Using mocks in unit testing in Haskell?


Consider the following piece of code:

data Slice = Slice
  { text :: String,
    color :: Color
  }

newtype Color = Color
  { string :: String
  }

mainList :: [FilePath] -> [FilePath] -> [String] -> [[Slice]]
mainList somethingA somethingB codedLines = ...

The Slice record represents a result of decoding a string with ANSI escape codes. Inside the mainList function I use some function, let's call it categorize that parses the codedLines (this represents a line of ANSI escape coded strings) parameter to [[Slice]].

First question:

How would I write unit tests for the mainList function using the Quickcheck unit testing framework? I am fairly new to Haskell. Now, if this had been some OOP language, I know what to do: have the class that has the categorize method passed as a parameter to the ctor of the class that has the mainList method and then mock my heart out of it using some mocking library or even write the mock manually. But what a man to do in Haskell, where there is no notion of classes?

Second question:

Maybe I can change the mainList function like follows:

mainList :: [FilePath] -> [FilePath] -> ([String] -> [[Slice]]) -> [[Slice]]

But then, I would have to pass it a mock instead of the third parameter. Since Haskell is not an OOP language, is there even a notion of a mock? Is this idiomatic Haskell? Or maybe I'm mistakenly projecting OOP principles onto a functional language?

Thanks in advance.


Solution

  • How you could mock

    You have correctly noticed that if you want to use different implementations of functions that are called by a function that you're testing, you can simply turn them into parameters instead of calling them directly:

    mainList :: TypeOfCategorize -> [FilePath] -> [FilePath] -> [String] -> [[Slice]]
    mainList categorize ... =
       ... categorize ...
    
    mainListForTesting = mainList categorizeMock
    
    mainListForProd = mainList categorizeReal
    

    This is in fact the exact same idea of which "dependency injection frameworks" are a massively more complicated version. If you would like it to be more complicated in Haskell too you can apply any of the standard techniques used for managing additional pieces of information you need to pass around and don't want to manage so explicitly; the fact that you're doing this for testing purposes isn't hugely different from many other contexts in which this concern arises. A few thoughts off the top of my head:

    1. You could use a single record-of-functions argument containing all of the functions you want to mock for testing (instead of a separate extra parameter for every function).
    2. You could use ReaderT to make such a record-of-functions accessible without needing to be explicitly passed all the time. (This will only save you much if you have a lot of functions calling each other with the same set of implementations, rather than every function needing a different set of functions that could be mocks or real)
    3. You could use a type tag to indicate what "purpose" the function is being run for (production or an extensible set of testing purposes, if you need more than one mock), and use type classes to provide the correct implementations for each purpose rather than explicitly passing them in.

    But fundamentally it all boils down to: if you want to use different values on some invocations than on others, then those values are some form of parameter. And in Haskell, an implementation is a function is a value. So you use the same tools for getting different implementations into your mainList calls for testing purposes that you would for getting any other value into different mainList calls for any other purpose.

    Should you mock?

    You also ask whether "mocking" is idiomatic Haskell, and I would say the answer is "it depends, but largely no" (full disclaimer: I would also say that mocking is massively over-used in OOP testing too, so maybe my preferred ways of thinking about testing are just not to your taste).

    mainList is pure (and is constrained to be so from the type, as long as we ignore unsafe* functions). That means that everything it calls is also pure. Almost all of the time, that means that there's no need to mock anything.

    mainList will call categorise with some particular arguments, perhaps multiple times. To test that mainList does the correct thing, you need return values for those calls. Since those calls are pure, each of them has a single correct answer that is completely independent of anything else that is going on in your program; there's no state to setup, no side effects you might not want, no external environment required. If mainList uses anything other than the single correct return value for each call to categorize, then you're not testing mainList's real behaviour, you're testing hypothetical behaviour when categorize returns something other than what it should; it's not going to do that in production, so testing what it does in those conditions isn't very useful. One way to get mainList to use the correct values so you can test the behaviour of mainList is to very carefully consider the test case you're running and work out what that correct answer(s) should be, and pass in a mock categorize that returns the correct answer(s) for this test case. But a far easier way is to just pass in the real categorize and let mainList get the correct answer from the function you wrote to produce the correct answers.

    This does mean you might get a test failure from mainList that only happens because categorize returned the wrong result. But the tests for mainList aren't what guards against that vulnerability; the tests for categorize do that job! If you get test failures for mainList and categorize at the same time, just start debugging categorize first (which is a good idea anyway, in case your changes there require flow-on changes in the things that call it, including mainList). You're far more likely to get a false negative test from mainList from a bad mock than you are from the real categorize. After all categorize is the thing you are specifically engineering to return the correct result for categorize calls; no mock is ever going to be as likely to succeed as that job. And if the requirements for categorize change what the correct result is (either from external change or because something was misunderstood originally), you're much better off having the test for mainList pick up the changed categorize and detect whether that causes mainList to fail. If categorize is mocked then the tests for mainList will continue to pass against the old definition for what categorize is supposed to return, and then fail in production. The danger of mocks getting out of sync with real implementations that change is much worse than the danger of getting two test failures caused by one defect, in my opinion.

    You might need to mock if:

    1. The function you're testing calls things that depend on an external environment that isn't present in your testing environment (like a database, interaction with a user, etc, etc)
    2. The function that you're testing calls things that have side effects that you don't want to happen in a testing environment (you should probably be able to test your logic for detecting when to automatically fire the missiles without actually firing the missiles)
    3. The function you're testing calls things that take an impractically long time to really compute

    In Haskell that's generally a pretty small minority of functions we want to test. Purity and the strong type system mean we're encouraged to have most of our code in pure functions, that can be tested independently of any external environment. Only a relatively small fraction of the code base directly deals with the external environment (rather than being parameterised on things that have been computed from it). And functions that take a long time are usually scaling with the size of some inputs, so they can be tested with smaller inputs that take less time.

    So in Haskell mocking is not generally considered a core testing technique that should be applied to everything.

    The mocking that we do do often isn't called or thought of as mocking. Using things like monad transformers or effect systems, we often write code that does depend on an environment or have side effects in a way that is polymorphic in the specific implementation of the effects. The motivation for this is usually talked about more in terms of getting better type safety and composability, but as a side effect (pun intended) it also means we can swap different implementations for testing. But this isn't used to mock out arbitrary pure functions called by pure functions under test.

    Property-based testing, like QuickCheck

    You mention "the QuickCheck unit testing framework". QuickCheck can indeed be regarded as a unit testing framework, but it usually isn't because it's a library for property-testing, which is a very specific kind of unit test (or isn't unit testing at all, depending on your definitions).

    For property tests, you don't just write one specific test case and check that you get the expected output. Instead you write a property which is parameterised on some inputs and tests if the property holds; this needs to be a general test of the property regardless of the specific value of the input (so very different from a typical unit test where you check features of the output for specific known inputs). QuickCheck will then randomly generate lots of values trying to find one that falsifies your property; the test passes if it can't find one.

    The standard trivial example (from the QuickCheck docs) is:

    import Test.QuickCheck
    
    -- Reversing a list twice results in the original input list
    prop_reverse :: [Int] -> Bool
    prop_reverse xs = reverse (reverse xs) == xs
    

    We didn't test this by checking that reverse (reverse [1, 2, 3]) == [1, 2, 3], we wrote a property that will test that for any list of integers.

    It's not my intent to talk a lot about property based testing, but I'm hoping that you might see why I bring it up in the context of your questions about mocks and testing with QuickCheck.

    If you were using QuickCheck to test mainList, you would be writing general properties that should work for any random input that QuickCheck generates. This does not go very well with the idea of mocking the functions that mainList calls. We don't have a specific test case with known inputs and therefore known calls that should be made to categorize, so we can't just make a mock categorize that ignores its inputs and returns the values we expect mainList to get. In an ideal world, the generated inputs for mainList will do a decent job of representing every possible input and code path that mainList could face in production, so we'd need a mock categorize that can return the correct result for any possible call. We already have a function with that specification: categorize. Writing categorize a second time just to test mainList is a bad idea.

    (If categorize has a tricky/fragile implementation for performance reasons, writing a more straightforward and obviously-correct implementation would be an excellent way to test categorize itself; just write a property test that the obviously-correct-but-slow implementation always returns the same thing as the fragile-but-fast implementation, for all inputs! But there's still no point using the second implementation in property tests of things that call categorize)

    The popularity of property-based testing in Haskell is another reason why mocking is not such a common technique. If you have properties that actually do a reasonable job characterising what correct behaviour of a function looks like, then you generally can't test those properties without real implementations