dimanche 4 octobre 2015

testing: compare numpy arrays while allowing a certain mismatch

I have two numpy arrays containing integers which I'm comparing with numpy.testing.assert_array_equal. The arrays are "equal enough", i.e. a few elements differ but given the size of my arrays, that's OK (in this specific case). But of course the test fails:

AssertionError:
Arrays are not equal

(mismatch 0.0010541406645359075%)
 x: array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],...
 y: array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],...

----------------------------------------------------------------------
Ran 1 test in 0.658s

FAILED (failures=1)

Of course one might argue that the (long-term) clean solution to this would be to adapt the reference solution or whatnot, but what I'd prefer is to simply allow for some mismatch without the test failing. I would have hoped for assert_array_equal to have an option for this, but this is not the case.

I've written a function which allows me to do exactly what I want, so the problem might be considered solved, but I'm just wondering whether there is a better, more elegant way to do this. Also, the approach of parsing the error string feels pretty hacky, but I haven't found a better way to get the mismatch percentage value.

def assert_array_equal_tolerant(arr1,arr2,threshold):
    """Compare equality of two arrays while allowing a certain mismatch.

    Arguments:
     - arr1, arr2: Arrays to compare.
     - threshold: Mismatch (in percent) above which the test fails.
    """
    try:
        np.testing.assert_array_equal(arr1,arr2)
    except AssertionError as e:
        for arg in e.args[0].split("\n"):
            match = re.search(r'mismatch ([0-9.]+)%',arg)
            if match:
                mismatch = float(match.group(1))
                break
        else:
            raise
        if mismatch > threshold:
            raise

Just to be clear: I'm not talking about assert_array_almost_equal, and using it is also not feasible, because the errors are not small, they might be huge for a single element, but are confined to a very small number of elements.

Aucun commentaire:

Enregistrer un commentaire