My question arises from a database test. We are forced to reduce the amount of data in order to arrive at a practically treatable selection, however, this selected data should go through as many branches of the software as possible.
As an example, consider the following case where we have 5 transformations in the code which give a different output based on whether some criterion is satisfied. The data might look like this:
real_data crit1 crit2 crit3 crit4 crit5
-----------------------------------------------------
<......> 0 1 0 1 0
1 1 0 0 1
1 0 0 1 1
0 1 1 0 0
1 0 1 1 1
There are many entries in front holding the "real" data --the specific form of which is not relevant here. Further, appended to the table, there are flags whether the criterion in the different transformation steps is satisfied.
The workflow is then like this:
________ ________ ________
data ___\ | step 1 | ____\ | step 2 | ____\ ... ____\ | step 5 | ____\ result
/ | 0/1? | / | 0/1? | / / | 0/1? | /
|________| |________| |________|
In theory, the transformations in the boxes might be correlated, so in the worst case one would have to test 2^5
cases. But clearly, this is not feasible in practice.
Therefore, one often assumes that those transformations are uncorrelated -- and thus one is satisfied as long as each branch is taken one time. In the best case one could use exactly two samples for that, namely,
real_data crit1 crit2 crit3 crit4 crit5
-----------------------------------------------------
<......> 0 0 0 0 0
1 1 1 1 1
But such a constellation doesn't exist in the given data set shown before.
So the question is: how do I find a minimal subset of the data such that each criterion-field contains 0
and 1
?
(Though I believe this is more an algorithmic question, I've also tagged SQL
as the solution shall be realized in DB2 LUW. Moreover, just to note, I have the feeling that dynamic programming is appropriate here.)
Aucun commentaire:
Enregistrer un commentaire