pythontestingfuzzingpython-hypothesis

In the Hypothesis testing library, what is the real difference between assume and filter?


Within the Hypothesis testing library for Python, there is the "assume" function, which "marks the example as bad, rather than failing the test". If there are too many "bad" examples generated in a row, Hypothesis, by default, will error out that test.

Within Hypothesis, there is a function that can be chained with Strategies named "filter" that, well, filters out unwanted generated data. If there are too many filtered data items generated in a row, Hypothesis, by default, will error out with the same kind of error as assume.

From the an example of assume from the docs:

@given(lists(integers()))
def test_sum_is_positive(xs):
    assume(len(xs) > 10)
    assume(all(x > 0 for x in xs))
    print(xs)
    assert sum(xs) > 0

Reimagined with filter:

@given(
    lists(
        integers().filter(lambda x: x > 0)
    ).filter(lambda x: len(x) > 10)
)
def test_sum_is_positive_filter(xs):
    print(xs)
    assert sum(xs) > 0

Are these essentially the same thing? I understand that the example may be a bit contrived, for educational purposes; they have been shrunk, as it were. If this is the case I just need to have my imagination stretched. (Note: running these two snippets on my Mac Ventura, the first fails a health check and the second one runs)

What is the real, practical difference between these two functions?

When should I use assume instead of filter?

Is there a performance difference between the two?

I searched through the Hypothesis repo to see if assume and filter were actually the same thing under the hood, and they seemed not to be.


Solution

  • Yes, they're essentially the same thing - rejection sampling.

    However, there's an important difference in practice: s.filter() allows Hypothesis to reject part of an example and try again (within limits!), whereas assume() has to throw away the whole test case and start over.

    If we ignore all the heuristics, runtime feedback, splicing, etc. for the sake of illustration... if we generate length-five lists from s = lists(booleans()) we will have (let's say) a 0.5^5 = ~3% chance of satisfying assume(all(s.example())). However, we have (let's say) a 9/10 chance of satisfying booleans().filter(bool) thanks to internal retries, so applying the filter to elements makes it much more likely (~60%) we'll create a list of elements we're happy with.