I'm writing unit tests around an HTML to PDF process and have a set of sample input HTML files and a set of PDF representing what the expected result would be. I'd like to compare these to determine that the process has generated the correct output.
Obviously PDF files have some non-deterministic components in them so I can't do a straight up binary compare. I don't particularly want to delve into parsing the PDF output, so I thought it might be neat to just check how much the files differ by (and have the test pass if they differ by, say, less than 1%).
I can't simply count the differing bytes in the same array location as it seems there can be slight size differences in the output, so things will be offset slightly differently in each file.
So, the question is, is there a tried and tested algorithm for determining how much the general content of 2 large byte arrays differ?
Thanks,
Steve.
Aucun commentaire:
Enregistrer un commentaire