I'm trying to unit-test a function that deals with csv files with Pytest. While my function works, I feel like there's a lot of code repetition when creating "sample" csv files in my project directory to test the function. The actual csv file that holds the real data has millions of records.
These are not the only csv files I have to test in my module, so it would be immensely helpful to know what's the best way to test functions that work with different file structures.
Right now I'm creating a very short csv file that mimics the actual file schema with a single line of data plus expected dataframe output after the file is processed through the function.
Perhaps mocking is the way to go? But I feel like you shouldn't need to mock for this kind of testing
Test Function
@pytest.mark.parametrize('test_file, expected', [
(r'Path\To\Project\Output\Folder\mock_sales2.csv',
pd.DataFrame([['A0A0A0', 1, 4000]], columns=['Postal_Code', 'Store_Num', 'Sales'])),
(r'Path\To\Project\Output\Folder\mock_sales2.csv',
pd.DataFrame([['A0A0A0', 1, 4000]], columns=['Postal_Code', 'Store_Num', 'Sales']))
])
def test_sales_dataframe(test_file, expected):
# This part is repetitive, I have to write something similar for every test function that uses a file.
mock_mks_sales1 = [['Data0', 'A0A0A0', 1, 'Data3', 'Data4', 'Data5', 4000]]
with open(r'Path\To\Project\Output\Folder\mock_sales1.csv', 'w') as file:
writer = csv.writer(file)
writer.writerows(mock_sales1)
mock_mks_sales2 = [['Data0', 'A0A0A0', 1, 'Data3', 'Data4', 'Data5', 'Data6', 4000]]
with open(r'Path\To\Project\Output\Folder\mock_sales2.csv', 'w') as file:
writer = csv.writer(file)
writer.writerows(mock_sales2)
sales_df = mks_sales_dataframe(test_file)
testing.assert_frame_equal(expected, sales_df)
os.remove(r'Path\To\Project\Output\Folder\mock_sales1.csv')
os.remove(r'Path\To\Project\Output\Folder\mock_sales2.csv')
Main Function
def sales_dataframe(file):
try:
with open(file, 'r') as f:
reader = csv.reader(f)
num_cols = len(next(reader))
columns = [1, 2, (num_cols - 1)] # Number of columns is variable, this is used later to accurately specify which columns should be read. This is part I'm really testing!
sales_df = pd.read_csv(file, usecols=columns, names=['Postal_Code', 'Store_Num', 'Sales'])
return sales_df
except FileNotFoundError:
raise FileNotFoundError(file)
The test passes as intended. However if I need to test other csv files I have to create a sample csv file for each one, than when the test is finished, delete each one. As you can imagine that's a lot of repetitive code when
Aucun commentaire:
Enregistrer un commentaire