I am writing some self contained integration tests around Apache Spark Streaming. I want to test that my code can ingest all kinds of edge cases in my simulated test data. When I was doing this with regular RDDs (not streaming). I could use my inline data and call "parallelize" on it to turn it into a spark RDD. However, I can find no such method for creating destreams. Ideally I would like to call some "push" function once in a while and have the tupple magically appear in my dstream. ATM I'm doing this by using Apache Kafka: I create a temp queue, and I write to it. But this seems like overkill. I'd much rather create the test-dstream directly from my test data without having to use Kafka as a mediator.
Aucun commentaire:
Enregistrer un commentaire