O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Apache Flink Training - DataStream API - ProcessFunction

1.608 visualizações

Publicada em

ProcessFunction combines events, timers, and state into a powerful building block for stream applications

Publicada em: Internet
  • Seja o primeiro a comentar

Apache Flink Training - DataStream API - ProcessFunction

  1. 1. 1 Apache Flink® Training Flink v1.3 – 14.9.2017 DataStream API ProcessFunction
  2. 2. ProcessFunction Combining timers with stateful event processing 2
  3. 3. Common Pattern  On each incoming element: • update some state • register a callback for a moment in the future  When that moment comes: • Check a condition and perform a certain action, e.g. emit an element 3
  4. 4. Flink 1.2 added ProcessFunction  Gives access to all basic building blocks: • Events • Fault-tolerant, Consistent State • Timers (event- and processing-time) 4
  5. 5. ProcessFunction  Simple yet powerful API: 5 /** * Process one element from the input stream. */ void processElement(I value, Context ctx, Collector<O> out) throws Exception; /** * Called when a timer set using {@link TimerService} fires. */ void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception;
  6. 6. ProcessFunction  Simple yet powerful API: 6 /** * Process one element from the input stream. */ void processElement(I value, Context ctx, Collector<O> out) throws Exception; /** * Called when a timer set using {@link TimerService} fires. */ void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception; A collector to emit result values
  7. 7. ProcessFunction  Simple yet powerful API: 7 /** * Process one element from the input stream. */ void processElement(I value, Context ctx, Collector<O> out) throws Exception; /** * Called when a timer set using {@link TimerService} fires. */ void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception; 1. Get the timestamp of the element 2. Interact with the TimerService to: • query the current time • and register timers 1. Do the above 2. Query if we are operating on Event or Processing time
  8. 8. ProcessFunction: example  Requirements: • maintain counts per incoming key, and • emit the key/count pair if no element came for the key in the last 100 ms (in event time) 8
  9. 9. ProcessFunction: example  Implementation sketch: • Store the count, key and last mod timestamp in a ValueState (scoped by key) • For each record: • update the counter and the last mod timestamp • register a timer 100ms from “now” (in event time) • When the timer fires: • check the callback’s timestamp against the last mod time for the key and • emit the key/count pair if they match 9
  10. 10. ProcessFunction: example // the data type stored in the state public class CountWithTimestamp { public String key; public long count; public long lastModified; } // apply the process function onto a keyed stream DataStream<Tuple2<String, Long>> result = stream .keyBy(0) .process(new CountWithTimeoutFunction()); 10
  11. 11. ProcessFunction: example public class CountWithTimeoutFunction extends RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> { @Override public void open(Configuration parameters) throws Exception { // register our state with the state backend } @Override public void processElement(Tuple2<String, Long> value, Context ctx, Collector<Tuple2<String, Long>> out) throws Exception { // update our state and register a timer } @Override public void onTimer(long timestamp, OnTimerContext ctx, Collector<Tuple2<String, Long>> out) throws Exception { // check the state for the key and emit a result if needed } } 11
  12. 12. public class CountWithTimeoutFunction extends RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> { private ValueState<CountWithTimestamp> state; @Override public void open(Configuration parameters) throws Exception { state = getRuntimeContext().getState( new ValueStateDescriptor<>("myState", CountWithTimestamp.class)); } } ProcessFunction: example 12
  13. 13. public class CountWithTimeoutFunction extends RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> { @Override public void processElement(Tuple2<String, Long> value, Context ctx, Collector<Tuple2<String, Long>> out) throws Exception { CountWithTimestamp current = state.value(); if (current == null) { current = new CountWithTimestamp(); current.key = value.f0; } current.count++; current.lastModified = ctx.timestamp(); state.update(current); ctx.timerService().registerEventTimeTimer(current.lastModified + 100); } } ProcessFunction: example 13
  14. 14. public class CountWithTimeoutFunction extends RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> { @Override public void onTimer(long timestamp, OnTimerContext ctx, Collector<Tuple2<String, Long>> out) throws Exception { CountWithTimestamp result = state.value(); if (timestamp == result.lastModified + 100) { out.collect(new Tuple2<String, Long>(result.key, result.count)); state.clear(); } } } ProcessFunction: example 14

×