I want to run a Kolmogorov-Smirnov test to check if my sample comes from a discrete-uniform distribution. More specifically, I use KS-Test in context of Benford's Law, which assume that third or forth digits of numbers should follow a discrete-uniform distribution.
Basically, my sample looks like this:
x = c(0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,8,9,9)
I've noticed the disc_ks_test Function of KSgeneral package for using KS-Test for discrete distributions. I've noticed as well, that the more common ks.test function of dgof package is now able to test discrete distributions as well (based on the paper by Arnold/Emerson). My problem is, that both tests provide different test-statistics (and p-Values) and i am not sure which one is correct. In this example:
ks.test(x,ecdf(0:9))
D = 0.1381, p-Value = 0.8181
and
disc_ks_test(x,ecdf(0:9))
D = 0.038095, p-Value = 0.9996
So, I calculated the test by hand in Excel and figured out where the functions differ:
I am pretty sure, that for continous distributions the test-statistic D is the Sepremum (or Maximum) of the last two columns of the Excel spreadsheet (that's what ks.test is doing). disc_ks_test just take the maximum of Abs(F0-Fn) as test-statistics, but the results for real data are much more consistent with results of other tests (Chi-Square). Now I wonder, which R function is correct or if there is an theoretical explanation, why the test-statistics of KS-test is calculated different if I test on discrete distributions.
Aucun commentaire:
Enregistrer un commentaire