vendredi 24 avril 2020

Correct way to write unit tests for class method with pytest

I want to write unittests for class with several methods for data transformation.

High level:

class my_class:

    def __init__(self, file):
        # read data out of .yml config file
        config = read_data_from_yml_config(file)
        self.value1 = config["value1"]
        self.value2 = config["value2"]

    def get_and_transform(self):
        data_dict = self.get_data()
        transformed_data = self.transform_data(data_dict)

        return transformed_data

    def get_data(self):
        data_dict = request_based_on_value1(self.value1)
        return data_dict

    def transform_data(self, data_dict):
        trnsf = transform1(data_dict, self.value2)

        return trnsf

Here, I have several questions. The main thing to test here ist my_class.transform_data(). It takes a dict as an input, reads it as a pandas data frame and does some transformations.

In my understanding, I need several fixtures d1, d2, d3, ... (as different values for data_dict) which represent the different test case inputs for my_class.transform_data(). As I want to make sure that the output is as expected, I would als define my expected output:

o1 # expected output for transform_data(d1)
o2, o3, ... # respectively

Several questions to that:

  1. Is this approach correct?
  2. How and where would I specify d1, d2, ... and o1, o2,....? I could either do that in the test_my_class.py-file or store d1_sample.pkl, ... in the tests/ folder. Here I would chose a minimal example for both d and o
  3. As the transformation in transform_data also depends on the attribute self.value2, how would I pass in different values for value2 without creating an instance of my_class?

In general, it is also not fully clear to me whether I would test on an "object"-level or on a "method" level. Above, I described a "method"-approach (because I am mainly interested in the results of transform_data). The alternative would be to provide different .yml files and thus creating different test instances of my_class.

def yml1():
    config = read_in_yml1()
    return config

# and so on for different configurations.

then for the test:

@pytest.mark.parametrize("test_input, expected", [(yml1, ???), (yml2, ???)])
def test_my_class():
    test_class = my_class(file)

    assert test_class.transform_data == expected

However, as the function input to my_class.transform_data() does not depend (directly) from the content of yml1, but rather the response of my_class.get_data(), that seems to make little sense. How would I test for different input values of data_dict?

What is the correct way to write unit tests in this scenario?

Aucun commentaire:

Enregistrer un commentaire