jeudi 31 octobre 2019

What's the best equation for a p-values calculation of testing item matching

I've inherited a document management system where there seem to be some documents wrongly associated with a client. We need to know the scale of the problem and how many clients we'd need to test to be fairly certain of the scale of the issue.

I remember from university doing a bit on p-values in statistical testing to determine the likelihood of mistake in your estimates, but not enough to figure out how I would approach this.

Say I have 1000 clients, each with 30 documents on average, and want to manually check the documents to see if it should have been allocated to that client, if I check 10, 50, 100 or 500, how would I estimated the p-value for each and would that p-value give me the confidence to say something like '1 in 100 clients has a bad document with an error estimate of 0.01'.

The ultimate question is one of the risk, if the error estimate is too high, we'll have to manually check each document for all 1000 clients, but if it's very low, we can perhaps accept the risk where of the 30,000 documents, say 30 are likely to be wrong.

found a wikipedia article, but I'm just not able to understand how I'd apply the numbers I've got.

Aucun commentaire:

Enregistrer un commentaire