I'm trying to create small Apache Beam streaming programs to test out ideas and I think the simplest thing for me to do to get data in would be to use framework constructs like Create.of to create fake data. That way, I wouldn't have to set up more than I need to, like setting up a GCP Pub/Sub topic as a source and publishing to it.
The problem is I want to try out things that are time based, like windowing and working with state and timers. I was able to put this together:
public class TestPipeline {
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
p.apply(Create.of(1, 2, 3))
.apply(ParDo.of(new DoFn<Integer, String>() {
@ProcessElement
public void processElement(ProcessContext c) {
c.output(c.element().toString());
}
}))
.apply(TextIO.write().to("myfile.txt"));
p.run().waitUntilFinish();
}
}
This accomplishes my goal of sending off three pieces of data at the beginning of my pipeline, but it sends them all at once. I'd prefer if I could set it to send each piece of data every 10 seconds, etc.
I followed this tutorial from Apache Flink (https://ci.apache.org/projects/flink/flink-docs-release-1.10/getting-started/walkthroughs/datastream_api.html) which shows an example of what I'm trying to accomplish. I dug into the code in that tutorial but I wasn't able to find out exactly which part of the Flink framework made that happen.
Aucun commentaire:
Enregistrer un commentaire