This slides provides description for how apex can be used by a developer. The slide also provides information about various components of Apex Operator lifecycle.
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Introduction to Apache Apex
1. Apache Apex Meetup
Introduction to Apache Apex
Real time streaming.. Really!!!
Chinmay Kolhatkar
chinmay@apache.org
February 13, 2016
2. Apache Apex Meetup
Agenda
➔ Project History
➔ What is Apache Apex?
➔ Directed Acyclic Graph (DAG)
➔ Components of DAG
➔ Windowing
➔ Operator Lifecycle
➔ Apache Apex Architecture
➔ Other features
3. Apache Apex Meetup
Project History
➔ Started development at DataTorrent in 2012
➔ Open-sourced under ASF in 2015
➔ Currently Have 50+ committers
➔ Free to use Streaming Application platform
4. Apache Apex Meetup
What is Apache Apex?
➔ Apex project is under Apache Software Foundation
➔ Apex is a Streaming Application platform
➔ YARN-native application
➔ Complete implementation is done in Java
➔ Consist of 2 primary components
◆ Apex Core - Engine which facilitates Real time processing
◆ Apex Malhar - Out-of-the-box operators that can be used with Apex
Core
5. Apache Apex Meetup
➔ Defines compute stages
➔ Defined how tuple flow over compute stages over stream
Directed Acyclic Graph (DAG)
Filtered
Stream
Output StreamTuple Tuple
FilteredStream
Enriched
Stream
Enriched
Stream
er
Operator
er
Operator
er
Operator
er
Operator
6. Apache Apex Meetup
➔ Smallest atomic data that flows over a
stream
➔ Emitted by Operators after processing
➔ Received by next Operator for
processing
➔ Java objects which are serializable
➔ Types:
◆ Data Tuple
◆ Control Tuple
Components of DAG - Tuple
7. Apache Apex Meetup
➔ Logical compute unit
➔ Java code which processes a tuple
➔ Runs inside a JVM
➔ Types
◆ Input Adapter
◆ Generic Operator
◆ Output Adapter
Components of DAG - Operator
8. Apache Apex Meetup
➔ Connect operators
➔ Channel that carries the tuples from
one operator to another
Components of DAG - Stream
9. Apache Apex Meetup
➔ Ends of a stream
➔ Part of operator
➔ Types of ports
◆ Input Port
◆ Output Port
Components of DAG - Ports
10. Apache Apex Meetup
Windowing
➔ Tuples divided into time slices
➔ Windows are given ids (type:long)
➔ Also called as Streaming Window
● Default 500ms
11. Apache Apex Meetup
➔ Input Operator inserts control tuple
➔ Control tuple marks window boundary
➔ Different operator may be processing
different windows
➔ All management activities of data
happens at the boundary of window
Windowing (contd…)
BeginWindow
Control Tuple
EndWindow
Control Tuple
Data
Tuples
Window nWindow n+1 Output
Adapter
Input
Adapter
Generic
Operator
12. Apache Apex Meetup
➔ Called by Apex Platform
➔ Simple unit test like lifecycle
➔ Governed by control tuples
➔ All operators in DAG go through
this life-cycle
Operator Lifecycle
13. Apache Apex Meetup
➔ Setup
◆ Start of operator lifecycle
◆ Do any initialization here
➔ beginWindow
◆ Marks starting of window
➔ endWindow
◆ Marks end of window
➔ teardown
◆ Do any finalization here
◆ End of operator lifecycle
Operator Lifecycle (contd...)
14. Apache Apex Meetup
➔ emitTuples
◆ Called for Input Adapters
◆ Called in an infinite while
loop by platform
➔ process
◆ Called for Generic Operators
and Output Adapters
◆ Associated to to a port
◆ Called for every incoming
tuple
Operator Lifecycle (contd...)
15. Apache Apex Meetup
➔ OutputPort::emit
◆ Special method not part of
operator lifecycle
◆ To be called by operator
code
◆ Emits the tuples to next
operator
◆ Bound by Window
Operator Lifecycle (contd...)
20. Apache Apex Meetup
➔ AT_LEAST_ONCE (default)
◆ Windows are processed at least once
➔ AT_MOST_ONCE
◆ Windows are processed at most once
➔ EXACTLY_ONCE
◆ Windows are processed exactly once
Processing Modes
21. Apache Apex Meetup
➔ Saves operator state on HDFS
➔ Each operator undergoes checkpointing
➔ Done by platform
➔ Happens every 60 streaming windows by default i.e. 30 sec.
➔ Checkpoint is named by the windowId at which it happens
➔ If all operators gets checkpointed at same window, that checkpointed state
becomes “committed” state of application
➔ Committed state is used for recovery in case of failure
Checkpointing