I often have the need to write a command line script that will read from a database, perform some analytics, and write the results back to the database. My effort to decouple and create a separate data layer generally is to write scripts load.py
, write.py
, and do_analytics.py
where load and write do the database interaction, and the do_analytics.py
file is something like this:
import load
import write
def batch_classify(model_filepath='my_model.pkl'):
with open(model_filepath, 'rb') as infile:
model = pickle.load(infile)
data_loader = load.DataLoader()
data_loader.load_data()
data_loader.clean_data()
data = data_loader.data
# Maybe do some more manipulations here...
output = model.transform(data)
data_writer = write.DataWriter()
data_writer.write_data(output)
if __name__ == "__main__":
# maybe would have some command line options here to pass to batch_classify
batch_classify()
I would now like to test some fixed dataset and make sure the classification (output) results are what I would expect. I don't need to test the actual database connection right now, so based on some research I think I want to do mocking as in this post but I'm not sure what level of object should be mocked, how to refactor correctly to actually test once I have the mocked object, and if this is even the best approach to begin with. When this has come up previously I have hacked around to get solutions that work via a small fixed test table in the actual database, but it's never elegant or clean code.
Aucun commentaire:
Enregistrer un commentaire