I'm writing a class that manages the histogram for a text corpus. Vose's Alias Method is being used to sample a random word from the histogram, which is weighted by its frequency in the corpus. I'd like to test that my implementation of Vose's Alias Method is correct. Given that the corpus contains n words, each with a probability of being selected equal to (number of times word occurs)/n, I have an expected distribution.
Running the sampling function of Vose's Alias Method enough times, an actual distribution is created, which represents the words that were selected from the corpus and the probability of those words being selected by the sampling function. I'm using Jensen-Shannon Divergence to decide if the actual distribution is similar enough to the expected distribution for me to decide that my sampling function is working as expected (and please tell me if there's a better way of doing this).
Given some level of certainty and n words, how many times should I run my sampling function before deciding that it's working correctly?
Aucun commentaire:
Enregistrer un commentaire