pythonhypothesis-testpython-hypothesisproperty-based-testing

In the Hypothesis library for Python, why does the text() strategy cause custom strategies to retry?


I have a custom strategy built using composite that draws from text strategy internally.

Debugging another error (FailedHealthCheck.data_too_large) I realized that drawing from the text strategy can cause my composite strategy to be invoked roughly twice as often as expected.

I was able to reproduce the following minimal example:

@hypothesis.strategies.composite
def my_custom_strategy(draw, n):
    """Strategy to generate lists of N strings"""

    trace("a")
    value = [draw(hypothesis.strategies.text(max_size=256)) for _ in range(n)]
    trace("b")
    return value


@given(my_custom_strategy(100))
def test_my_custom_strategy(value):
    assert len(value) == 100
    assert all(isinstance(v, str) for v in value)

In this scenario, trace("a") was invoked 206 times, whereas trace("b") was only invoked 100 times. These numbers are consistent across runs.

More problematic, the gap increases the more times I call text(), and super-linearly. When n=200, trace("a") is called 305 times. n=400, 984 times. n=500 or greater, the test reliably pauses and then completes after the 11th iteration (with only 11 iterations, instead of 100!)

What's happening here?


Solution

  • I suspect it's because you're running into the maximum entropy (about 8K) used to generate Hypothesis examples, if some of the strings you generate happen to be quite long. Setting a reasonable max_size in the text strategy would help, if I'm right.

    As a more general tip, shrinking can be more efficient if you use the lists() strategy (or another collections strategy) rather than picking an integer and then that many elements. This is not a subtle problem though; if you haven't already noticed you don't need to do anything!