mardi 5 janvier 2016

Writing tests for file parser functions in a modular way

I am writing a piece of python code which parses a formatted file into a python object. The file can vary, but for now I'm working based off a subset of what the file could be and hoping tests can help me to extend it for all of these files.

The file itself consists of a header containing metadata, followed by several data blocks.

[general header, describes length of header 1 & header 2]
[header describing data block 1]
[header describing data block 2]
[data block 1]
[data block 2]

Currently my code is outlined in the following way

with datafile as open(filename, 'r'):
    gen_header_obj = parse_gen_header(datafile)
    header1_obj = parse_header1(datafile, gen_header_obj.header1_len)
    header2_obj = parse_header2(datafile, gen_header_obj.header2_len)
    data1_obj = parse_data1(datafile, header1_obj.datalen)
    data2_obj = parse_data2(datafile, header2_obj.datalen)

Where each parse*(file) function calls file.readline() several times, depending on size of the specified data length.

Ideally I would have at least 5 separate tests, where I provide a fake portion of the file and sees if it gets the information correctly. Except in this case the portions of data are quite large (megabytes).

Would it be possible to write tests that resemble the following?

class TestParser(unittest.TestCase)
    filename = 'locally_stored_file.txt'

    def setUp(self):
        self.file = open(filename, 'r')

    def tearDown(self):
        self.file.close()

    def test_gen_header_parse(self):
        result = parse_gen_header(datafile)
        self.header1_len = result.header1_len
        self.header2_len = result.header2_len
        expected = ...
        assertIsEqual(result, expected)

    def test_header1_parse(self):
        # datafile.seek() is left of from test_gen_header_parse
        result = parse_header1(datafile, self.header1_len)
        self.data1_len = result.data1_len
        expected = ...
        assertIsEqual(result, expected)

    def test_header2_parse(self):
        # datafile.seek() is left of from test_header1_parse
        result = parse_header2(datafile, self.header2_len)
        self.data2_len = result.data2_len
        expected = ...
        assertIsEqual(result, expected)

    def test_data1_parse(self):
        # datafile.seek() is left of from test_header2_parse
        result = parse_data1(datafile, self.data1_len)
        expected = ...
        assertIsEqual(result, expected)

    def test_data2_parse(self):
        # datafile.seek() is left of from test_data1_parse
        result = parse_data2(datafile, self.data2_len)
        expected = ...
        assertIsEqual(result, expected)

    # Some code to force the tests to run sequentially as laid out above

As you can see I'm trying to write five seperate tests, which will hopefully fail individually if something breaks in the future. However I'm not able to test parse_header2 without running parse_gen_header and parse_header1 beforehand.

Not sure if there's a better way to approach this.

Aucun commentaire:

Enregistrer un commentaire