I would like to ask for some ideas, how to test results of ETL process. We have one database (Sybase, table T) and by ETL process (in Infromatica) we copy data to (Oracle, inputTable IT) and transform them (as required) to another database (Oracle again, finalTable FT).
We would like to write tests for data quality transmission. My idea is followed:
- Compare num of rows in T, IT and FT - for this I would need some external skript (written in Java or something).
- If I will believe Informatica will copy everything (this is only copy, only single transformation there is Sybase bit to Oracle char(1)) - so I can campare only data in IT and FT and then I can have only procedure in Oracle, which is ideal.
Question is (in point 2.) how much data to test because of time (ETL is running every our and make copy whole database, so I don't have time for test every row). My idea is to test 10-20% of dataset randomly.
Does anyone any other idea how to test data quality (values mainly, constraints are minimal)?
Thanks very much
J.
Aucun commentaire:
Enregistrer un commentaire