dimanche 8 juillet 2018

Spark Testing and Optimization

I need help regarding the following 2 questions
1. In Spark Dataframe is repartitionBy better or using window function better for distributing a large set and performing complex aggregatitions?
2 How do we unit test spark ? ie I want a comparator program to verify my result of sql query(not spark sql) vs spark dataframes result ,along with time taken in both.

Aucun commentaire:

Enregistrer un commentaire