I have written a method to filter out duplicates from an RDD and decided to write a unit test for the method. Here is my method:
def filterDupes(salesWithDupes: RDD[((String, String), SalesData)]): RDD[((String, String), SalesData)] = {
salesWithDupes.map(salesWithDupes => ((salesWithDupes._2.saleType, salesWithDupes._2.saleDate), salesWithDupes))
.reduceByKey((a, _) => a)
.map(_._2)
}
Since this is my first experience writing a test in Scala I've faced several complexities.
Am I correctly passing elements from the list to the filtering method? Now I'm stuck with how to validate the result that is returned from the method.
The only approach I came up with for now is collecting the RDD 's data to a list and then checking its size. Is it the right way?
Here is how I see the logic of the test:
"Sales" should "be filtered" in {
Given("Sales RDD")
val rddWithDupes = sc.parallelize(Seq(
(("metric1", "metric2"), createSale("1", saleType = "Type1", saleDate = "2014-10-12")),
(("metric1", "metric2"), createSale("2", saleType = "Type1", saleDate = "2014-10-12")),
(("metric1", "metric2"), createSale("3", saleType = "Type3", saleDate = "2010-11-01"))
))
When("Sales RDD is filtered")
val filteredResult = SalesProcessor.filterDupes(rddWithDupes).collect.toList
Then("Sales are filtered")
filteredResult.size should be(2)
????
}
Aucun commentaire:
Enregistrer un commentaire