testing: How to properly test a scientific data file processing software in Python?

jeudi 13 juin 2019

How to properly test a scientific data file processing software in Python?

I'm working in a molecular biology lab where we implemented some degree of lab automation using robotic systems. In particular, we have a measurement robot that produces plain text files containing biological data. In order for lab members to further process and analyse the data, I have written a Python application that converts the plain text files into some more useful tabular format and performs basic statistical analyses. I am reaching the point where I am about to leave the lab. I want to clean up my code, so I can be (more or less) easily maintained and expended on by future scientists working with it. It is kind of a mess, since I grew with my knowledge and proficiency in coding over a few years now. While I am refactoring and rewriting parts of the software, I want to make sure that stuff is still working properly. I have found that automated unit testing is the most solid strategy here (rather than to manually check every time an analysis runs).

I found this to be easy for the statistical functions, as I can simply come up mock data and know what to expect and how to handle it. My question now is: How do I mock several hundred files in a highly specific format being copied correctly?

What I have done so far is to just take input data from previous runs that I know are correct to run on my current build. I then compared the output file to the corresponding output file from the previous build. To me as a self-taught coder this seems like I am missing something here. How would you recommend me to test my program? Or is this strategy actually something that would be acceptable?

testing

jeudi 13 juin 2019

How to properly test a scientific data file processing software in Python?

Aucun commentaire:

Enregistrer un commentaire