samedi 26 septembre 2020

Validating/testing outputs in Jupyter notebook

Often, when I'm working on data science projects using Jupyter notebook, I find myself in the following scenario:

I do some analysis and I get some results. I check some of these results against an external source of information to make sure that the results are correct. Then I might continue working on my code, do some additional analysis, etc. Sometimes I accidentally do something later that changes the output of the verified, correct results, which is not desirable.

Now I was thinking that there ought to be a way to somehow store the correct, checked results in the notebook and subsequently flag the cell to be tested so that if I ever change the code so that the output of that cell changes, I would get some kind of an error message. Basically, I want something resembling unit testing, but completely inside the notebook and with minimal/no additional coding.

Does this exist or would it be possible to implement this in Jupyter notebook?

I looked into nbval, which is similar to what I had in mind, except that it only compares the results of the cells to the values in the notebook from the previous execution, not to user-flagged reference results as I would like it to. Also, it looks like it always validates entire notebooks, which is problematic, because sometimes my notebooks contain individual cells that take very long time to execute so I don't want to run them again every time when I want to verify my notebook. I could perhaps make it sort of work by always running nbval before re-executing any of the cells I have executed before, but it feels like an extra hassle to do this manually every time I re-execute a cell, and, also, it still has the problem of not giving me the control over which cells are to be tested.

Aucun commentaire:

Enregistrer un commentaire