jeudi 3 mars 2016

Algorithm to generate random testcases for histogram distance

I've written my own implementation of EMD histogram distance in Java.

Please see this example of input and output:

 h1 = mapOf(5 to 0.5, 7 to 0.5)
 h2 = mapOf(2 to 0.3, 8 to 0.7)

 emd(h1, h2) evaluates to 2.0

Now, I'd like to generate random tests for it to check its correctness.

Is there a good approach to generate two histograms of most general form (arbitrary number of items, non-trivial items) with known EMD between them?

It would also be acceptable to generate a pair of histograms h1, h2, calculate emd(h1, h2), then modify h1 to g1, h2 to g2 and check how emd(g1, g2) differs from emd(h1, h2), but again, I need a way to modify h1 and h2 so that the EMD changes in a predictable way.

I tried to find the most similar keys k1 and k2 in h1 and h2 respectively and add C to both h1[k1] and h2[k2], expecting that emd(h1, h2) would change by C, but it didn't always seem to hold. (*)

My implementation can use histograms with any types of keys provided a metric for them, but ints with Math.abs(x, y) metric will be enough for testing.

UPD: The approach marked by (*) seems to be correct, it didn't work due to a bug in the tests generation. Still waiting for the other good solutions.

Aucun commentaire:

Enregistrer un commentaire