pythonconditional-statementspython-hypothesis

selecting conditional code paths in hypothesis


Most conditional strategies seem to be data-driven. But what if I want to select a code path independently of any generated data?

For example, let's convert the grammar rule_a = rule_optional? rule_b to a strategy:

@strategy.composite
def rule_a(draw):
  elem_b = draw(rule_b())
  if draw(strategy.sampled_from([False, True])):
    elem_opt = draw(rule_optional())
    elem_b = combine(elem_opt, elem_b)
  return elem_b

Say I'm generating a data structure from a grammar, and to test a particular behavior I want to modify the data structure by replacing a single randomly-selected node from a subset of specific locations in the syntax tree. Rewriting the grammar to generate this modified data is too complicated, so I modify the generated structure instead:

@strategy.composite
def generate_maybe_modified(draw):
  data = draw(generate())
  locations = []
  for location, node in iterate(data):
    if match_specific(data, location, node):
      locations.append(location)
  if not draw(strategy.sampled_from([False, True])
    return data
  modified = draw(modified())
  location = draw(strategy.sampled_from(locations))
  data = replace(location, data, modified)
  return data

The common idea is to select a code path by randomly selecting from [False, True], where False is first to shrink towards the shortest code path. Is this the most efficient solution?

Is there any difference between draw(strategy.sampled_from([False, True])) and random.choice([False, True]) where random is managed by Hypothesis, i.e. an instance of a HypothesisRandom subclass?

Can I manage probabilities outside of Hypothesis? That is, can I increase the chance of the simpler code path being drawn by instead using:

random.choices([False, True], [90, 10])[0]

Solution

  • As discussed in this answer, it's best to specify only what inputs are valid, and let Hypothesis work out the probabilities for you.

    Is there any difference between draw(strategy.sampled_from([False, True])) and random.choice([False, True]) where random is managed by Hypothesis, i.e. an instance of a HypothesisRandom subclass?

    These are both more-or-less equivalent to draw(st.booleans()), which would be my preferred idiom.

    Most conditional strategies seem to be data-driven. But what if I want to select a code path independently of any generated data? ...

    Say I'm generating a data structure from a grammar, and to test a particular behavior I want to modify the data structure by replacing a single randomly-selected node from a subset of specific locations in the syntax tree. Rewriting the grammar to generate this modified data is too complicated, so I modify the generated structure instead ... Is this the most efficient solution?

    A location in the syntax tree is generated data, so from Hypothesis' perspective there's nothing unusual going on when it generates the data.

    However, this can be inefficient when we try to modify previously-generated examples (during heuristic search, or when shrinking a failing input). Ideally changes to Hypothesis-controlled decisions will have local effects - e.g. replacing nodes within each subtree rather than the whole example means that fewer mutations will invalidate replay of a later decision - but it's unlikely to be a big problem if this is impractical.