SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Apache Apex Meetup
Introduction to Apache Apex
Real time streaming.. Really!!!
Chinmay Kolhatkar
chinmay@apache.org
February 13, 2016
Apache Apex Meetup
Agenda
➔ Project History
➔ What is Apache Apex?
➔ Directed Acyclic Graph (DAG)
➔ Components of DAG
➔ Windowing
➔ Operator Lifecycle
➔ Apache Apex Architecture
➔ Other features
Apache Apex Meetup
Project History
➔ Started development at DataTorrent in 2012
➔ Open-sourced under ASF in 2015
➔ Currently Have 50+ committers
➔ Free to use Streaming Application platform
Apache Apex Meetup
What is Apache Apex?
➔ Apex project is under Apache Software Foundation
➔ Apex is a Streaming Application platform
➔ YARN-native application
➔ Complete implementation is done in Java
➔ Consist of 2 primary components
◆ Apex Core - Engine which facilitates Real time processing
◆ Apex Malhar - Out-of-the-box operators that can be used with Apex
Core
Apache Apex Meetup
➔ Defines compute stages
➔ Defined how tuple flow over compute stages over stream
Directed Acyclic Graph (DAG)
Filtered
Stream
Output StreamTuple Tuple
FilteredStream
Enriched
Stream
Enriched
Stream
er
Operator
er
Operator
er
Operator
er
Operator
Apache Apex Meetup
➔ Smallest atomic data that flows over a
stream
➔ Emitted by Operators after processing
➔ Received by next Operator for
processing
➔ Java objects which are serializable
➔ Types:
◆ Data Tuple
◆ Control Tuple
Components of DAG - Tuple
Apache Apex Meetup
➔ Logical compute unit
➔ Java code which processes a tuple
➔ Runs inside a JVM
➔ Types
◆ Input Adapter
◆ Generic Operator
◆ Output Adapter
Components of DAG - Operator
Apache Apex Meetup
➔ Connect operators
➔ Channel that carries the tuples from
one operator to another
Components of DAG - Stream
Apache Apex Meetup
➔ Ends of a stream
➔ Part of operator
➔ Types of ports
◆ Input Port
◆ Output Port
Components of DAG - Ports
Apache Apex Meetup
Windowing
➔ Tuples divided into time slices
➔ Windows are given ids (type:long)
➔ Also called as Streaming Window
● Default 500ms
Apache Apex Meetup
➔ Input Operator inserts control tuple
➔ Control tuple marks window boundary
➔ Different operator may be processing
different windows
➔ All management activities of data
happens at the boundary of window
Windowing (contd…)
BeginWindow
Control Tuple
EndWindow
Control Tuple
Data
Tuples
Window nWindow n+1 Output
Adapter
Input
Adapter
Generic
Operator
Apache Apex Meetup
➔ Called by Apex Platform
➔ Simple unit test like lifecycle
➔ Governed by control tuples
➔ All operators in DAG go through
this life-cycle
Operator Lifecycle
Apache Apex Meetup
➔ Setup
◆ Start of operator lifecycle
◆ Do any initialization here
➔ beginWindow
◆ Marks starting of window
➔ endWindow
◆ Marks end of window
➔ teardown
◆ Do any finalization here
◆ End of operator lifecycle
Operator Lifecycle (contd...)
Apache Apex Meetup
➔ emitTuples
◆ Called for Input Adapters
◆ Called in an infinite while
loop by platform
➔ process
◆ Called for Generic Operators
and Output Adapters
◆ Associated to to a port
◆ Called for every incoming
tuple
Operator Lifecycle (contd...)
Apache Apex Meetup
➔ OutputPort::emit
◆ Special method not part of
operator lifecycle
◆ To be called by operator
code
◆ Emits the tuples to next
operator
◆ Bound by Window
Operator Lifecycle (contd...)
Apache Apex Meetup
Apache Apex Architecture
Machine nodes (Physical or Virtual)
Hadoop (YARN)
Distributed File System
(e.g. HDFS)
Apache Apex Core (Streaming Engine)
Streaming Application Streaming Application
RESTAPI
External
Data
Sources
Apache Apex Malhar
(Reusable Operators, Connectors)
Custom Operators
Apache Apex Meetup
➔ Ease of Use
➔ Locality
➔ Fault Tolerance
➔ Scalability
◆ Partitioning
◆ Auto-scaling
Other features of platform
Apache Apex Meetup
● Apache Apex Page
○ http://apex.incubator.apache.org
● Mailing Lists
○ dev@apex.incubator.apache.org
○ users@apex.incubator.apache.org
● Repository
○ https://github.com/apache/incubator-apex-core
○ https://github.com/apache/incubator-apex-malhar
● Issue Tracking
○ https://issues.apache.org/jira/browse/APEXCORE
○ https://issues.apache.org/jira/browse/APEXMALHAR
Resources
● @ApacheApex
● /groups/7020520
Apache Apex Meetup
Apex in Distributed Environment
Hadoop Edge Node
dtManage
(Web UI)
Hadoop Node
YARN Container
App Master
Hadoop Node
YARN Container
YARN Container
YARN Container
Thread1
Op2
Op1
Thread-N
Op3
Worker
Container
Hadoop Node
YARN Container
YARN Container
YARN Container
Thread1
Op2
Op1
Thread-N
Op3
Worker
Container
CLI
dtGateway
(REST API)
Part of DataTorrent RTS
dtGateway
(REST API)
dtManage
(Web UI)
Web
Browser
Apache Apex Meetup
➔ AT_LEAST_ONCE (default)
◆ Windows are processed at least once
➔ AT_MOST_ONCE
◆ Windows are processed at most once
➔ EXACTLY_ONCE
◆ Windows are processed exactly once
Processing Modes
Apache Apex Meetup
➔ Saves operator state on HDFS
➔ Each operator undergoes checkpointing
➔ Done by platform
➔ Happens every 60 streaming windows by default i.e. 30 sec.
➔ Checkpoint is named by the windowId at which it happens
➔ If all operators gets checkpointed at same window, that checkpointed state
becomes “committed” state of application
➔ Committed state is used for recovery in case of failure
Checkpointing

Mais conteúdo relacionado

Mais procurados

Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
Apache Apex
 

Mais procurados (20)

Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Apache Apex & Bigtop
Apache Apex & BigtopApache Apex & Bigtop
Apache Apex & Bigtop
 
Fault-Tolerant File Input & Output
Fault-Tolerant File Input & OutputFault-Tolerant File Input & Output
Fault-Tolerant File Input & Output
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Apache Apex Meetup at Cask
Apache Apex Meetup at CaskApache Apex Meetup at Cask
Apache Apex Meetup at Cask
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Apex as yarn application
Apex as yarn applicationApex as yarn application
Apex as yarn application
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache Apex
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
 
Smart Partitioning with Apache Apex (Webinar)
Smart Partitioning with Apache Apex (Webinar)Smart Partitioning with Apache Apex (Webinar)
Smart Partitioning with Apache Apex (Webinar)
 

Destaque

ML3.4 Ponjavic Djuric Smiljanic
ML3.4 Ponjavic Djuric SmiljanicML3.4 Ponjavic Djuric Smiljanic
ML3.4 Ponjavic Djuric Smiljanic
Nenad Smiljanic
 
sweet_magic_final_2
sweet_magic_final_2sweet_magic_final_2
sweet_magic_final_2
sweet magic
 
Presentación de Moodle
Presentación de MoodlePresentación de Moodle
Presentación de Moodle
cruizgaray
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Cloudera, Inc.
 

Destaque (20)

Building Distributed Data Streaming System
Building Distributed Data Streaming SystemBuilding Distributed Data Streaming System
Building Distributed Data Streaming System
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode
In-Memory Computing, Storage & Analysis: Apache Apex + Apache GeodeIn-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode
In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode
 
ML3.4 Ponjavic Djuric Smiljanic
ML3.4 Ponjavic Djuric SmiljanicML3.4 Ponjavic Djuric Smiljanic
ML3.4 Ponjavic Djuric Smiljanic
 
탄생석과 패션
탄생석과 패션탄생석과 패션
탄생석과 패션
 
sweet_magic_final_2
sweet_magic_final_2sweet_magic_final_2
sweet_magic_final_2
 
Gebeurtenis
GebeurtenisGebeurtenis
Gebeurtenis
 
Practica
PracticaPractica
Practica
 
14 higuerilla
14 higuerilla14 higuerilla
14 higuerilla
 
Apache Apex as a YARN Apllication
Apache Apex as a YARN ApllicationApache Apex as a YARN Apllication
Apache Apex as a YARN Apllication
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
 
Mejora continua en las pymes
Mejora continua en las pymesMejora continua en las pymes
Mejora continua en las pymes
 
Presentación de Moodle
Presentación de MoodlePresentación de Moodle
Presentación de Moodle
 
REDES NEURONALES
REDES NEURONALESREDES NEURONALES
REDES NEURONALES
 
Individual and societal risk
Individual and societal riskIndividual and societal risk
Individual and societal risk
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
 
Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into Flink
 
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis  Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis
 
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
 

Semelhante a Introduction to Apache Apex

software defined network, openflow protocol and its controllers
software defined network, openflow protocol and its controllerssoftware defined network, openflow protocol and its controllers
software defined network, openflow protocol and its controllers
Isaku Yamahata
 

Semelhante a Introduction to Apache Apex (20)

Building Your First Apache Apex Application
Building Your First Apache Apex ApplicationBuilding Your First Apache Apex Application
Building Your First Apache Apex Application
 
Building your first aplication using Apache Apex
Building your first aplication using Apache ApexBuilding your first aplication using Apache Apex
Building your first aplication using Apache Apex
 
BigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache ApexBigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache Apex
 
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas WeiseStream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas Weise
 
Apache Apex - BufferServer
Apache Apex - BufferServerApache Apex - BufferServer
Apache Apex - BufferServer
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Flink in action
Flink in actionFlink in action
Flink in action
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
php & performance
 php & performance php & performance
php & performance
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and vision
 
GopherCon IL 2020 - Web Application Profiling 101
GopherCon IL 2020 - Web Application Profiling 101GopherCon IL 2020 - Web Application Profiling 101
GopherCon IL 2020 - Web Application Profiling 101
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
 
Copper: A high performance workflow engine
Copper: A high performance workflow engineCopper: A high performance workflow engine
Copper: A high performance workflow engine
 
Orchestration service v2
Orchestration service v2Orchestration service v2
Orchestration service v2
 
software defined network, openflow protocol and its controllers
software defined network, openflow protocol and its controllerssoftware defined network, openflow protocol and its controllers
software defined network, openflow protocol and its controllers
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 

Último

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

Último (20)

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 

Introduction to Apache Apex

  • 1. Apache Apex Meetup Introduction to Apache Apex Real time streaming.. Really!!! Chinmay Kolhatkar chinmay@apache.org February 13, 2016
  • 2. Apache Apex Meetup Agenda ➔ Project History ➔ What is Apache Apex? ➔ Directed Acyclic Graph (DAG) ➔ Components of DAG ➔ Windowing ➔ Operator Lifecycle ➔ Apache Apex Architecture ➔ Other features
  • 3. Apache Apex Meetup Project History ➔ Started development at DataTorrent in 2012 ➔ Open-sourced under ASF in 2015 ➔ Currently Have 50+ committers ➔ Free to use Streaming Application platform
  • 4. Apache Apex Meetup What is Apache Apex? ➔ Apex project is under Apache Software Foundation ➔ Apex is a Streaming Application platform ➔ YARN-native application ➔ Complete implementation is done in Java ➔ Consist of 2 primary components ◆ Apex Core - Engine which facilitates Real time processing ◆ Apex Malhar - Out-of-the-box operators that can be used with Apex Core
  • 5. Apache Apex Meetup ➔ Defines compute stages ➔ Defined how tuple flow over compute stages over stream Directed Acyclic Graph (DAG) Filtered Stream Output StreamTuple Tuple FilteredStream Enriched Stream Enriched Stream er Operator er Operator er Operator er Operator
  • 6. Apache Apex Meetup ➔ Smallest atomic data that flows over a stream ➔ Emitted by Operators after processing ➔ Received by next Operator for processing ➔ Java objects which are serializable ➔ Types: ◆ Data Tuple ◆ Control Tuple Components of DAG - Tuple
  • 7. Apache Apex Meetup ➔ Logical compute unit ➔ Java code which processes a tuple ➔ Runs inside a JVM ➔ Types ◆ Input Adapter ◆ Generic Operator ◆ Output Adapter Components of DAG - Operator
  • 8. Apache Apex Meetup ➔ Connect operators ➔ Channel that carries the tuples from one operator to another Components of DAG - Stream
  • 9. Apache Apex Meetup ➔ Ends of a stream ➔ Part of operator ➔ Types of ports ◆ Input Port ◆ Output Port Components of DAG - Ports
  • 10. Apache Apex Meetup Windowing ➔ Tuples divided into time slices ➔ Windows are given ids (type:long) ➔ Also called as Streaming Window ● Default 500ms
  • 11. Apache Apex Meetup ➔ Input Operator inserts control tuple ➔ Control tuple marks window boundary ➔ Different operator may be processing different windows ➔ All management activities of data happens at the boundary of window Windowing (contd…) BeginWindow Control Tuple EndWindow Control Tuple Data Tuples Window nWindow n+1 Output Adapter Input Adapter Generic Operator
  • 12. Apache Apex Meetup ➔ Called by Apex Platform ➔ Simple unit test like lifecycle ➔ Governed by control tuples ➔ All operators in DAG go through this life-cycle Operator Lifecycle
  • 13. Apache Apex Meetup ➔ Setup ◆ Start of operator lifecycle ◆ Do any initialization here ➔ beginWindow ◆ Marks starting of window ➔ endWindow ◆ Marks end of window ➔ teardown ◆ Do any finalization here ◆ End of operator lifecycle Operator Lifecycle (contd...)
  • 14. Apache Apex Meetup ➔ emitTuples ◆ Called for Input Adapters ◆ Called in an infinite while loop by platform ➔ process ◆ Called for Generic Operators and Output Adapters ◆ Associated to to a port ◆ Called for every incoming tuple Operator Lifecycle (contd...)
  • 15. Apache Apex Meetup ➔ OutputPort::emit ◆ Special method not part of operator lifecycle ◆ To be called by operator code ◆ Emits the tuples to next operator ◆ Bound by Window Operator Lifecycle (contd...)
  • 16. Apache Apex Meetup Apache Apex Architecture Machine nodes (Physical or Virtual) Hadoop (YARN) Distributed File System (e.g. HDFS) Apache Apex Core (Streaming Engine) Streaming Application Streaming Application RESTAPI External Data Sources Apache Apex Malhar (Reusable Operators, Connectors) Custom Operators
  • 17. Apache Apex Meetup ➔ Ease of Use ➔ Locality ➔ Fault Tolerance ➔ Scalability ◆ Partitioning ◆ Auto-scaling Other features of platform
  • 18. Apache Apex Meetup ● Apache Apex Page ○ http://apex.incubator.apache.org ● Mailing Lists ○ dev@apex.incubator.apache.org ○ users@apex.incubator.apache.org ● Repository ○ https://github.com/apache/incubator-apex-core ○ https://github.com/apache/incubator-apex-malhar ● Issue Tracking ○ https://issues.apache.org/jira/browse/APEXCORE ○ https://issues.apache.org/jira/browse/APEXMALHAR Resources ● @ApacheApex ● /groups/7020520
  • 19. Apache Apex Meetup Apex in Distributed Environment Hadoop Edge Node dtManage (Web UI) Hadoop Node YARN Container App Master Hadoop Node YARN Container YARN Container YARN Container Thread1 Op2 Op1 Thread-N Op3 Worker Container Hadoop Node YARN Container YARN Container YARN Container Thread1 Op2 Op1 Thread-N Op3 Worker Container CLI dtGateway (REST API) Part of DataTorrent RTS dtGateway (REST API) dtManage (Web UI) Web Browser
  • 20. Apache Apex Meetup ➔ AT_LEAST_ONCE (default) ◆ Windows are processed at least once ➔ AT_MOST_ONCE ◆ Windows are processed at most once ➔ EXACTLY_ONCE ◆ Windows are processed exactly once Processing Modes
  • 21. Apache Apex Meetup ➔ Saves operator state on HDFS ➔ Each operator undergoes checkpointing ➔ Done by platform ➔ Happens every 60 streaming windows by default i.e. 30 sec. ➔ Checkpoint is named by the windowId at which it happens ➔ If all operators gets checkpointed at same window, that checkpointed state becomes “committed” state of application ➔ Committed state is used for recovery in case of failure Checkpointing