mardi 24 avril 2018

Unit-testing a probability distribution with conditionals

I have a function choose(elems) -> elem that calls rand() which makes it non-deterministic.

To be able to better test this, I figured that I could split this function in two,

generate_choices(elems, ...) -> distribution
choose(distribution) -> elem

where choose() is a thin wrapper around rand() and generate_choices() generates a distribution from which to draw an element. I could then deterministically test that this probability distribution is as expected.

The distribution is uniform but with two conditionals:

  1. If there are not enough elems, add a random fallback element uniformly.
  2. If there are still not enough elems, add a random default element uniformly.

Some examples:

generate_choices([a, b, c, d], [], []) -> [a, b, c, d]
generate_choices([a, b, c], [fallback1], []) -> [a, b, c, fallback1]
generate_choices([a, b, c], [fb1, fb2], []) -> [a, b, c, (fb1 | fb2)]
generate_choices([a, b], [fb1, fb2], [default1]) -> [a, b, (fb1 | fb2), default1]
generate_choices([a, b], [fb1, fb2], [d1, d2]) -> [a, b, (fb1 | fb2), (d1 | d2) ]
generate_choices([a], [fb1, fb2], [d1, d2]) -> [a, (fb1|fb2), (d1|d2) ]

My question is then: How should I model distribution?

  • If I choose a simple list and call rand() from within generate_choices() to fill the fallback and default, then I can only test some deterministic parts of generate_choices().
  • If I choose three lists, (elems, fallback, default), then generate_choices() is fully deterministic, but then choice() becomes less trivial and must be tested more thoroughly anyways.

Aucun commentaire:

Enregistrer un commentaire