I've written my own implementation of EMD histogram distance in Java.
Please see this example of input and output:
h1 = mapOf(5 to 0.5, 7 to 0.5)
h2 = mapOf(2 to 0.3, 8 to 0.7)
emd(h1, h2) evaluates to 2.0
Now, I'd like to generate random tests for it to check its correctness.
Is there a good approach to generate two histograms of most general form (arbitrary number of items, non-trivial items) with known EMD between them?
It would also be acceptable to generate a pair of histograms h1
, h2
, calculate emd(h1, h2)
, then modify h1
to g1
, h2
to g2
and check how emd(g1, g2)
differs from emd(h1, h2)
, but again, I need a way to modify h1
and h2
so that the EMD changes in a predictable way.
I tried to find the most similar keys k1
and k2
in h1
and h2
respectively and add C
to both h1[k1]
and h2[k2]
, expecting that emd(h1, h2)
would change by C
, but it didn't always seem to hold. (*)
My implementation can use histograms with any types of keys provided a metric for them, but int
s with Math.abs(x, y)
metric will be enough for testing.
UPD: The approach marked by (*) seems to be correct, it didn't work due to a bug in the tests generation. Still waiting for the other good solutions.
Aucun commentaire:
Enregistrer un commentaire