jeudi 17 octobre 2019

How to implement a functional regression test for applications processing extremely large datasets?

A very large number of applications can be summarized as taking an initial dataset A and transforming it to some target dataset B.

There's a very effective model of testing for these applications: just store a serialized representation of a known dataset A along with its desired transformed product B. Then in your test, de-serialize A, run the processing application on it, then compare your result to B.

I'm not sure if this model of testing has a formal name; I call it a functional regression test.

This model works great when datasets A and B are small. However, once they get to very large sizes, this model becomes impractical. There may not be enough storage space in the system to store 2 more copies of a huge dataset. Restoring them from serialized form, and doing a full comparison at the end, will often be impractically time-intensive as well.

Is there an approach to make this model practical again for big data applications?

Aucun commentaire:

Enregistrer un commentaire