I am writing a piece of python code which parses a formatted file into a python object. The file can vary, but for now I'm working based off a subset of what the file could be and hoping tests can help me to extend it for all of these files.
The file itself consists of a header containing metadata, followed by several data blocks.
[general header, describes length of header 1 & header 2]
[header describing data block 1]
[header describing data block 2]
[data block 1]
[data block 2]
Currently my code is outlined in the following way
with datafile as open(filename, 'r'):
gen_header_obj = parse_gen_header(datafile)
header1_obj = parse_header1(datafile, gen_header_obj.header1_len)
header2_obj = parse_header2(datafile, gen_header_obj.header2_len)
data1_obj = parse_data1(datafile, header1_obj.datalen)
data2_obj = parse_data2(datafile, header2_obj.datalen)
Where each parse*(file) function calls file.readline() several times, depending on size of the specified data length.
Ideally I would have at least 5 separate tests, where I provide a fake portion of the file and sees if it gets the information correctly. Except in this case the portions of data are quite large (megabytes).
Would it be possible to write tests that resemble the following?
class TestParser(unittest.TestCase)
filename = 'locally_stored_file.txt'
def setUp(self):
self.file = open(filename, 'r')
def tearDown(self):
self.file.close()
def test_gen_header_parse(self):
result = parse_gen_header(datafile)
self.header1_len = result.header1_len
self.header2_len = result.header2_len
expected = ...
assertIsEqual(result, expected)
def test_header1_parse(self):
# datafile.seek() is left of from test_gen_header_parse
result = parse_header1(datafile, self.header1_len)
self.data1_len = result.data1_len
expected = ...
assertIsEqual(result, expected)
def test_header2_parse(self):
# datafile.seek() is left of from test_header1_parse
result = parse_header2(datafile, self.header2_len)
self.data2_len = result.data2_len
expected = ...
assertIsEqual(result, expected)
def test_data1_parse(self):
# datafile.seek() is left of from test_header2_parse
result = parse_data1(datafile, self.data1_len)
expected = ...
assertIsEqual(result, expected)
def test_data2_parse(self):
# datafile.seek() is left of from test_data1_parse
result = parse_data2(datafile, self.data2_len)
expected = ...
assertIsEqual(result, expected)
# Some code to force the tests to run sequentially as laid out above
As you can see I'm trying to write five seperate tests, which will hopefully fail individually if something breaks in the future. However I'm not able to test parse_header2
without running parse_gen_header
and parse_header1
beforehand.
Not sure if there's a better way to approach this.
Aucun commentaire:
Enregistrer un commentaire