The slides for Stream Processing Meetup (7/19/2018)(https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/251481797/).
This presentation introduces the newly-developed Samza Runner for Apache Beam. You will see the capability of the Samza Runner and how it supports key Beam features. You will also see a few use cases and our future roadmap.
3. Apache Beam Overview
Apache Beam is an advanced unified programming model designed to provide
efficient and portable data processing pipelines.
● Unified - Single programming model for both batch and streaming
● Advanced - Strong consistency via event-time, i.e. windowing, triggering, late
arrival handling, accumulation, etc
● Portable - Execute pipelines of multiple programming language SDKs,
including Java, Python and Go
● Efficient - Write and share SDKs, IO connectors, and transformation libraries
https://beam.apache.org/
4. Beam Model
● A Pipeline encapsulates your entire data
processing task, from start to finish
● IO is the end points for data input and
output
● A PCollection represents an
immutable distributed data set that
your Beam pipeline operates on
● A PTransform represents a data
processing operation, or a step, in
your pipeline
IO.read
IO.write
PTransform
IO.read
IO.write
PCollection
Pipeline
6. Beam Event Time
12:00 12:01 12:02 12:03
12:02
12:03
12:04
2
3
4
5
6
7
8
1
1-min fixed
window using
processing time
12:01
ProcessingTime
Event time
7. Beam Event Time
12:00 12:01 12:02 12:03
12:02
12:03
12:04
1
2
1-min fixed
window using
event time
- Watermark: a
timestamp that all
events before that have
arrived.
- Data that arrives with a
timestamp after the
watermark is
considered late data.
- Example using simple
watermark of event
timestamp 12:01
watermark
ProcessingTime
Event time
8. Beam Event Time
12:00 12:01 12:02 12:03
12:02
12:03
12:04
1
2
3
4
5
6
12:01
1-min fixed
window using
event time
- Watermark: a
timestamp that all
events before that have
arrived.
- Data that arrives with a
timestamp after the
watermark is
considered late data.
- Example using simple
watermark of event
timestamp
watermark
ProcessingTime
Event time
9. Beam Event Time
12:00 12:01 12:02 12:03
12:02
12:03
12:04
1
2
3
4
5
6
7
8
12:01
1-min fixed
window using
event time
- Watermark: a
timestamp that all
events before that have
arrived.
- Data that arrives with a
timestamp after the
watermark is
considered late data.
- Example using simple
watermark of event
timestamp
watermark
lateProcessingTime
Event time
10. Beam Windowing
Windowing divides data into event-time-based finite chunks.
Often required when doing aggregations over unbounded data.
Fixed Sliding
1 2 3
54
Sessions
2
431
Key
2
Key
1
Key
3
Time
2 3 4
11. Beam Stateful Processing
12:00 12:01 12:02 12:03
12:02
12:03
12:04
12:01
(news)
(msg)
(msg)
(jobs)
(msg)
(network)
(news) 12:00-12:01 12:01-12:02 12:02-12:03
news 1 0 1
msg 0 3 0
network 0 0 1
jobs 0 1 0
● Beam provides several state abstractions,
e.g. ValueState, BagState, MapState,
CombineState
● State is on a per-key-and-window basis
State for counting PageKey:
ProcessingTime
Event time
14. The Goal of Samza Runner
Bring the easy-to-use, but powerful, model of
Beam to Samza users for state-of-art stream
and batch data processing, with portability
across a variety of programming languages.
15. Samza Overview
● The runner combines the large-scale stream processing capabilities of
Samza with the the advanced programming model of Beam
● First class support for local state (with RocksDB store)
● Fault-tolerance with support for incremental checkpointing of state
instead of full snapshots
● A fully asynchronous processing engine that makes remote calls
efficient
● Flexible deployment models, e.g. Yarn and standalone with
Zookeeper
17. How Samza Runner Works?
● A Beam runner translates the Beam API into its native API + runtime
logic, and executed it in a distributed data processing system.
● Samza Runner translates Beam API into Samza high-level API and
execute the logic in a distributed manner, e.g. Yarn, Standalone.
● Samza runner contains the logic to support Beam features
- Beam IO - Event time/Watermark - GroupByKey
- Keyed State - Triggering Timers - Side
Input
18. BoundedSourceSystem
UnboundedSourceSystem
Unbounded/Bounded IO
● UnboundedSourceSystem adapts any
Unbounded IO.Read into a Samza
SystemConsumer. It will 1) split the sources
according to the parallelism needed; 2) generate
IncomingMessageEnvelopes of either event or
watermark
● BoundedSourceSystem adapts any Bounded
IO.Read into a Samza SystemConsumer
● Direct translation is also supported for Samza
native data connectors, e.g. translating
KafkaIO.Read directly into KafkaSystemConsumer
KafkaIO.Read
(KafkaUnboundedSource)
TextIO.Read
(TextSource)
Samza
StreamProcessor
Events/Watermarks
Events/End-of-stream
19. Watermark
● Watermark is injected at a fixed interval from unbounded sources
● Watermarks are propagated through each downstream operators and
aggregated using the following logic:
InputWatermak(op) = max (CurrentInputWatermark(op), min(OutputWatermark(op') | op' is upstream of op))
OutputWatermark(op) = max (CurrentOutputWatermark(op), InputWatermark(op))
21. GroupByKey
GroupByKey
● Automatically inserting partitionBy before
reduce
● The intermediate aggregation results are
stored in Samza key-value stores (RocksDb by
default)
● The output is triggered by watermarks by
default
KafkaIO.Read
partitionBy
FlatMap
Run
ReduceFn State
KV<key, value>
22. State Support
● Beam states are provided by
SamzaStoreStateInternals
● The key for each state cell is
(element key, window id, address)
● Samza also provides an readIterator()
interface for large states that won’t fit
in memory
ValueState
BagState
SetState
MapState
CombingState
WatermarkState
SamzaStoreStateInternals
RocksDb
23. Timer Support
● Beam timers are provided by
SamzaTimerInternalsFactory
● Support both event-time and
processing-time timers
● Event-time timers are managed using a
sorted set ordered by timestamp
● Processing-time timers are managed by
Samza TimerRegistry via
TimerFunction API
● All timers are keyed by TimerKey
(id, namespace, element key)
SamzaTimerInternals
Event-time Timers
GroupByKey
k1
timer1
k2
timer2 k3
timer3
k4
timer4
Processing-time Timers
Key Timer
k1 timer1
k2 timer2
k3 timer3
setTimer
watermark
register
Samza
SystemTimerScheduler
24. View/Side Input
● Beam views: SingletonView,
IterableView, ListView, MapView,
MultimapView
● Beam views are materialized into a
physical stream and broadcast to all
tasks using Samza broadcast
operator
● ParDo will consume the broadcasted
view as side input
KafkaIO.Read
0 1 2 3
ParDo
ParDo
ParDo
ParDo
TextIO.Read
ParDo
Combine.
GloballyAsSingletonView
0
broadcast
Broadcast
Stream
Side input
Main input
25. Deployment
Local (single JVM)
●Default mode: No config
required
●LocalApplicationRunner
●PassthroughJobCoordinator
●All tasks grouped into one
container
Yarn
●RemoteApplicationRunner
●YarnJobFactory
●Configure containers using
job.container.count
N
M
N
M
N
M
N
M
N
M
N
M
RM
Yarn
Cluster
JVM
Process
Standalone (zookeeper)
●LocalApplicationRunner
●ZkJobCoordinator
●Configure zk connection
job.coordinator.zk.connect
StreamProcessor
Samza
Contai
ner
Job
Coordi
nator
StreamProcessor
Samza
Contai
ner
Job
Coordi
nator
StreamProcessor
Samza
Contai
ner
Job
Coordi
nator
StreamProcessor
Samza
Contai
ner
Job
Coordi
nator
Zookeeper
27. Use Case 1: Fixed-window Join to Track Location
Onboard
location
transmitt
er
Radar
Monitor
WithKey
(key by ID)
WithKey
(key by ID)
FixedWindow
(10 min)
FixedWindow
(10 min)
CoGroupByKey
(join)
Location
Info
DB
Suppose you own a Star Trek fleet, and you want to track the location of your Starships. The
location data are gathered through Starship on-board transmitters as well as your radar
monitors. Now let’s track their location in event time of a 10-min window.
(T1,
Enterprise,
SF, 1)
(R1,
Enterprise
SV, 9)
Enterprise,
(SF, 1)
Enterprise,
(SV, 9)
Enterprise,
(SF, 1)
Window(0 : 9)
Enterprise,
(SV, 9)
Window(0 : 9)
ParDo
KafkaIO.Write
DbIO.Write
Enterprise,
(SF, 1), (SV, 9)
Window(0 : 9)
Enterprise,
SV
Window(0 : 9)
Enterprise,
SV
Enterprise,
SV
28. Use Case 2: Session-window Join to Gather Activities
Suppose we are heading out to Disneyland. We would like to know the activity
count for each person. Here we use session window join to gather the activities
done per person.
Ticket Purchase
Event
Membership
Purchase Event
Activity Event
CoGroupByKey
SessionWindow
(4 hour)
SessionWindow
(4 hour)
SessionWindow
(4 hour)
Join by id
Xinyu: G00001 Boris: M00001
G00001: Space Mountain
M00001: Harry Potter
M00001: Small World
Count.perKey
Xinyu: 1
Boris: 2
29. Use Case 3: Sliding-window Aggr. for Feature Generation
Calculate the features of count, top N and sum for particular key for PageView
events using a 1-day sliding window with 1-min update interval.
PageView Event
SlidingWindow
(1day, every min)
Count.perKey Top.largestPerKey(n)
Filtter.by
Sum.globally
Schema pageViewSchema = RowSqlTypes
.builder()
.withVarcharField("pageKey")
.withTimestampField("timestamp")
.build();
PCollection<Row> pageViewsRows = pageViews
.apply(MapElements
.into(TypeDescriptor.of(Row.class))
.via((PageViewEvent pv) ->
Row.withSchema(pageViewSchema)
.addValues(pv.pageKey.toString(),
new DateTime(pv.time)).build()))
.setCoder(pageViewSchema.getRowCoder());
PCollection<KV<String, Long>> counts = pageViewsRows
.apply(BeamSql.query(
"SELECT COUNT(*) AS `count` FROM pageView "
+ "GROUP BY pageKey, "
+ "HOP(timestamp, INTERVAL '1' MINUTE, INTERVAL '1' DAY)"));
Alternatively, using SQL:
31. Future Work
● Python! ● Async Support ● Table API
# A sample word count
p =Pipeline(options=pipeline_options)
# Read the text file[pattern] into a PCollection.
lines = p | 'read' >> ReadFromText(known_args.input)
# Count the occurrences of each word.
counts = (lines
| 'split' >> (ParDo(WordExtractingDoFn())
.with_output_types(unicode))
| 'pair_with_one' >>Map(lambda x: (x, 1))
| 'group' >> GroupByKey()
| 'count' >> Map(lambda (word, ones): (word,
sum(ones))))
# Format the counts into a PCollection of strings.
output = counts
| 'format' >>Map(lambda (word, c): '%s: %s' %
(word, c))
# Write the output using a "Write" transform that has side
effects.
# pylint: disable=expression-not-assigned
output | 'write' >> WriteToText(known_args.output)
result = p.run()
result.wait_until_finish()
// Use CompletionStage for asynchronous processing
input.apply(ParDo.of(
new DoFn<InputT, OutputT>() {
@ProcessElement
public void process
(@Element CompletionStage<InputT> element, ...) {
element.thenApply(...)
}
}
));
// PTable is the Table abstraction
PTable<KV<String, User>> userTable =
pipeline.apply(
EspressoTable.readWrite()
.withDb("dbname")
.withTable("user"));
pageView
.apply(TableParDo.of(
new DoFn<KV<String, PageViewEvent>, String>() {
@ProcessElement
public void processElement(ProcessContext c,
@TableContext.Inject TableContext tc) {
String id = c.element().getKey();
//table lookup
Table<String, User> users = tc.getTable(userTable);
User user = settings.get(id);
c.output(id + “:” + user.getName().toString());
}
})
.withTables(userTable));
// Convenient helper class to do the same thing
PCollection<String> result = PCollectionTableJoin
.of(pageView, userTable)
.into(TypeDescriptors.strings())
.via((pv, user) ->
pv.getKey() + “:” + user.getName().toString());
32. Thank you!
And
Special Thanks to Our Early Adopters:
Yingkai Hu, Froila Dsouza, Zhongen Tao,
Nithin Reddy, Bruce Su
https://beam.apache.org/documentation/runners/samza/
Notas do Editor
Talk about the key features that are not available in current samza.
How samza runner works? What runner is. What we need to support Beam features.
When propagating watermarks across stages (connected by intermediate streams), partitionBy operator will send the watermarks to a single downstream task to aggregate the watermarks and then broadcast aggregated watermark to all the peer tasks.
As far as we know, most of the existing Beam runners don’t support bigger-than-memory state.