I want to write unittests for class with several methods for data transformation.
High level:
class my_class:
def __init__(self, file):
# read data out of .yml config file
config = read_data_from_yml_config(file)
self.value1 = config["value1"]
self.value2 = config["value2"]
def get_and_transform(self):
data_dict = self.get_data()
transformed_data = self.transform_data(data_dict)
return transformed_data
def get_data(self):
data_dict = request_based_on_value1(self.value1)
return data_dict
def transform_data(self, data_dict):
trnsf = transform1(data_dict, self.value2)
return trnsf
Here, I have several questions. The main thing to test here ist my_class.transform_data()
. It takes a dict as an input, reads it as a pandas data frame and does some transformations.
In my understanding, I need several fixtures d1
, d2
, d3
, ... (as different values for data_dict
) which represent the different test case inputs for my_class.transform_data(). As I want to make sure that the output is as expected, I would als define my expected output:
o1 # expected output for transform_data(d1)
o2, o3, ... # respectively
Several questions to that:
- Is this approach correct?
- How and where would I specify
d1
,d2
, ... ando1
,o2
,....? I could either do that in thetest_my_class.py
-file or store d1_sample.pkl, ... in thetests/
folder. Here I would chose a minimal example for bothd
ando
- As the transformation in
transform_data
also depends on the attributeself.value2
, how would I pass in different values forvalue2
without creating an instance ofmy_class
?
In general, it is also not fully clear to me whether I would test on an "object"-level or on a "method" level. Above, I described a "method"-approach (because I am mainly interested in the results of transform_data
). The alternative would be to provide different .yml files and thus creating different test instances of my_class
.
def yml1():
config = read_in_yml1()
return config
# and so on for different configurations.
then for the test:
@pytest.mark.parametrize("test_input, expected", [(yml1, ???), (yml2, ???)])
def test_my_class():
test_class = my_class(file)
assert test_class.transform_data == expected
However, as the function input to my_class.transform_data()
does not depend (directly) from the content of yml1
, but rather the response of my_class.get_data()
, that seems to make little sense. How would I test for different input values of data_dict
?
What is the correct way to write unit tests in this scenario?
Aucun commentaire:
Enregistrer un commentaire