jeudi 1 juin 2017

Testing database read/write plus analytics script in Python

I often have the need to write a command line script that will read from a database, perform some analytics, and write the results back to the database. My effort to decouple and create a separate data layer generally is to write scripts load.py, write.py, and do_analytics.py where load and write do the database interaction, and the do_analytics.py file is something like this:

import load
import write

def batch_classify(model_filepath='my_model.pkl'):
    with open(model_filepath, 'rb') as infile:
        model = pickle.load(infile)

    data_loader = load.DataLoader()
    data_loader.load_data()
    data_loader.clean_data()
    data = data_loader.data
    # Maybe do some more manipulations here...
    output = model.transform(data)

    data_writer = write.DataWriter()
    data_writer.write_data(output)

if __name__ == "__main__":
    # maybe would have some command line options here to pass to batch_classify
    batch_classify()   

I would now like to test some fixed dataset and make sure the classification (output) results are what I would expect. I don't need to test the actual database connection right now, so based on some research I think I want to do mocking as in this post but I'm not sure what level of object should be mocked, how to refactor correctly to actually test once I have the mocked object, and if this is even the best approach to begin with. When this has come up previously I have hacked around to get solutions that work via a small fixed test table in the actual database, but it's never elegant or clean code.

Aucun commentaire:

Enregistrer un commentaire