We have a set of classifiers written in python which mostly classifies image data. We would like to ensure that new versions of a classifier, test data or external libraries does not lower the quality of our classifiers. For quality metrics we use ram usage, cpu time and of course the classification accuracy. Testing and analysing the models usually takes several hours and the resulting models are up to a gigabyte in size.
We have to test different variants, e.g. running on cpu, running on gpu, running on linux, running on windows. Then, we want to have some kind of report, to inspect and evaluate the metrics.
I am pretty new to this topic and don't know where to start. Is there a common tooling or approach to:
- Distribute the test task which large amount of test data (10 GB) and the classifiers on different nodes
- Execute the tests
- Store the test result (i.e. the metrics) in a (standardise?) file format
- Gather the test result from the nodes and generate and publish a report
- And also plot the result over time
We are using gitlab-ci as continuous integration. And would like to integrate this into our release pipeline.
Of course I tried to google this. However, maybe I am lacking of the correct wording but I could find' any suitable answer.
Aucun commentaire:
Enregistrer un commentaire