SlideShare uma empresa Scribd logo
1 de 14
APACHE SPARK
ARCHITECTURE
Presented By:
Jyotasana Bharti
(MT/ITY/10003/19)
CS/IT
CONTENT
 Introduction
 Features
 Understanding Apache Spark Architecture
 Working of Apache Spark Architecture
 Applications
 Conclusion
 References
INTRODUCTION
 An open-source, cluster-computing framework that provides in-memory processing of large
amount of data.
 Its performance is up to 100 times faster in memory and 10 times faster on disk when
compared to Hadoop
 With powerful APIs that help to correlate the unstructured, structured and semi-structured
data, to analyse and evaluate the data to make future predictions.
Features
Fast Ease of development Deployment flexibility Unified Stack
Multi- language
support
• 10x faster on disk.
• 100x faster in
memory.
• Interactive shell.
• Less code.
• More operators.
• Write programs
quickly
• Deployment -
Mesos, YARN,
Standalone
• Storage -MapR-XD,
HDFS, S3
• Build applications
combining different
processing model.
• Batch Analytics,
Streaming
Analytics and
Interactive
Analytics.
• Scala
• Python
• Java
• Spark
• R
UNDERSTANDING APACHE SPARK
ARCHITECTURE
 Apache Spark Architecture is based on two main abstractions:
1. Resilient Distributed Dataset (RDD)
2. Directed Acyclic Graph (DAG)
 Apache Spark RDD’s supports two types of operations:
1. Transformations
2. Actions
Fig.: Lazy Transformation Model
(Continued…)
The Apache Spark Architecture has two main daemons along with a cluster manager. It is
basically a master/slave architecture. The two daemons are:
1. Master Daemon: It handles the Master/Driver Process.
2. Worker Daemon: It handles the Slave Process.
Role of Driver
 Drives own application.
 Crates a JVM for the code that is being submitted by the client.
 Driver stores the metadata about all the Resilient Distributed Databases and their partitions.
(Continued…)
Role of Cluster Manager
 The role of the cluster manager is to allocate resources across applications. The Spark is capable enough of running on a large
number of clusters.
 Schedules the Spark Application.
 Allocates the resources to the Driver program to run the tasks.
 It consists of various types of cluster managers such as Hadoop YARN, Apache Mesos and Standalone Scheduler.
 Here, the Standalone Scheduler is a standalone spark cluster manager that facilitates to install Spark on an empty set of
machines.
Role of Worker
 Consists of Executors and tasks
 Executes the tasks assigned by Cluster Manager.
Role of Executor
 Executor performs all the data processing.
 Reads from and Writes data to external sources.
 Executor stores the computation results data in-memory, cache or on hard disk drives.
 Interacts with the storage systems.
APACHE SPARK ARCHITECTURE
OVERVIEW
Fig.: Apache Spark Architecture
(Continued…)
WORKING OF APACHE SPARK
ARCHITECTURE
The client submits spark user application code. When an application code is submitted, the
driver implicitly converts user code that contains transformations and actions into a logically
directed acyclic graph called DAG. At this stage, it also performs optimizations such as pipelining
transformations.
After that, it converts the logical graph called DAG into physical execution plan with many
stages. After converting into a physical execution plan, it creates physical execution units called
tasks under each stage. Then the tasks are bundled and sent to the cluster.
Now the driver talks to the cluster manager and negotiates the resources. Cluster manager
launches executors in worker nodes on behalf of the driver. At this point, the driver will send the
tasks to the executors based on data placement. When executors start, they register themselves
with drivers. So, the driver will have a complete view of executors that are executing the task.
During the course of execution of tasks, driver program will monitor the set of executors that
runs. Driver node also schedules future tasks based on data placement.
APPLICATIONS:
 Healthcare
 Banking
 Stock Exchange
 Machine Learning
 Fog Computing
 Uber
 Pinterest
CONCLUSION
 Spark can run independently. Thus it gives flexibility.
The architecture enumerates its ease of use, accessibility, and the ability to handle big data
tasks.
The architecture has finally come to dominate Hadoop mainly because of its speed. It finds
usage in many industries. It has taken Hadoop MapReduce to a completely new level with few
shuffles in the processing of data. The efficiency 100X of the system is enhanced by the in-
memory data storage and real-time processing of data.
The lazy evaluation contributes to the speed.
REFERENCES
https://www.ijrte.org/wp-content/uploads/papers/v8i6/F7820038620.pdf
https://www.youtube.com/watch?v=jffQhcweGwY
https://mapr.com/ebooks/spark/03-apache-spark-architecture-overview.html
https://www.javatpoint.com/apache-spark-architecture
THANK YOU

Mais conteúdo relacionado

Mais procurados

Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Sparkphanleson
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!Edureka!
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Sparkphanleson
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesWalaa Hamdy Assy
 
Building Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksBuilding Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksLace Lofranco
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Edureka!
 
Dive into the new features of apache spark
Dive into the new features of apache sparkDive into the new features of apache spark
Dive into the new features of apache sparkLearnbay Datascience
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch ProcessingEdureka!
 
Apache Spark at Viadeo
Apache Spark at ViadeoApache Spark at Viadeo
Apache Spark at ViadeoCepoi Eugen
 
Learning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark ProgrammingLearning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark Programmingphanleson
 
Lighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureLighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureJen Stirrup
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduceEdureka!
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairsphanleson
 
Apache spark
Apache spark Apache spark
Apache spark Edureka!
 

Mais procurados (20)

Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
 
Apache spark
Apache sparkApache spark
Apache spark
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
 
Building Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksBuilding Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure Databricks
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
 
Dive into the new features of apache spark
Dive into the new features of apache sparkDive into the new features of apache spark
Dive into the new features of apache spark
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch Processing
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
 
Apache Spark at Viadeo
Apache Spark at ViadeoApache Spark at Viadeo
Apache Spark at Viadeo
 
Learning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark ProgrammingLearning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark Programming
 
Lighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureLighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in Azure
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduce
 
In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairs
 
Apache Spark Notes
Apache Spark NotesApache Spark Notes
Apache Spark Notes
 
Apache spark
Apache spark Apache spark
Apache spark
 

Semelhante a Apache spark architecture (Big Data and Analytics)

Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonBenjamin Bengfort
 
Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Xuan-Chao Huang
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkVenkata Naga Ravi
 
What is Apache spark
What is Apache sparkWhat is Apache spark
What is Apache sparkmanisha1110
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Sigmoid
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdfMaheshPandit16
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkLaxmi8
 
Low latency access of bigdata using spark and shark
Low latency access of bigdata using spark and sharkLow latency access of bigdata using spark and shark
Low latency access of bigdata using spark and sharkPradeep Kumar G.S
 
BigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptxBigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptxBibhasDeb1
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkManish Gupta
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the SurfaceJosi Aranda
 
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...rajeshseo5
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Olalekan Fuad Elesin
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your EyesDemi Ben-Ari
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkIke Ellis
 

Semelhante a Apache spark architecture (Big Data and Analytics) (20)

Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
APACHE SPARK.pptx
APACHE SPARK.pptxAPACHE SPARK.pptx
APACHE SPARK.pptx
 
What is Apache spark
What is Apache sparkWhat is Apache spark
What is Apache spark
 
Module01
 Module01 Module01
Module01
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Low latency access of bigdata using spark and shark
Low latency access of bigdata using spark and sharkLow latency access of bigdata using spark and shark
Low latency access of bigdata using spark and shark
 
BigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptxBigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptx
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache Spark
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
 
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your Eyes
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You Think
 

Último

Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 

Último (20)

Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 

Apache spark architecture (Big Data and Analytics)

  • 1. APACHE SPARK ARCHITECTURE Presented By: Jyotasana Bharti (MT/ITY/10003/19) CS/IT
  • 2. CONTENT  Introduction  Features  Understanding Apache Spark Architecture  Working of Apache Spark Architecture  Applications  Conclusion  References
  • 3. INTRODUCTION  An open-source, cluster-computing framework that provides in-memory processing of large amount of data.  Its performance is up to 100 times faster in memory and 10 times faster on disk when compared to Hadoop  With powerful APIs that help to correlate the unstructured, structured and semi-structured data, to analyse and evaluate the data to make future predictions.
  • 4. Features Fast Ease of development Deployment flexibility Unified Stack Multi- language support • 10x faster on disk. • 100x faster in memory. • Interactive shell. • Less code. • More operators. • Write programs quickly • Deployment - Mesos, YARN, Standalone • Storage -MapR-XD, HDFS, S3 • Build applications combining different processing model. • Batch Analytics, Streaming Analytics and Interactive Analytics. • Scala • Python • Java • Spark • R
  • 5. UNDERSTANDING APACHE SPARK ARCHITECTURE  Apache Spark Architecture is based on two main abstractions: 1. Resilient Distributed Dataset (RDD) 2. Directed Acyclic Graph (DAG)  Apache Spark RDD’s supports two types of operations: 1. Transformations 2. Actions Fig.: Lazy Transformation Model
  • 6. (Continued…) The Apache Spark Architecture has two main daemons along with a cluster manager. It is basically a master/slave architecture. The two daemons are: 1. Master Daemon: It handles the Master/Driver Process. 2. Worker Daemon: It handles the Slave Process. Role of Driver  Drives own application.  Crates a JVM for the code that is being submitted by the client.  Driver stores the metadata about all the Resilient Distributed Databases and their partitions.
  • 7. (Continued…) Role of Cluster Manager  The role of the cluster manager is to allocate resources across applications. The Spark is capable enough of running on a large number of clusters.  Schedules the Spark Application.  Allocates the resources to the Driver program to run the tasks.  It consists of various types of cluster managers such as Hadoop YARN, Apache Mesos and Standalone Scheduler.  Here, the Standalone Scheduler is a standalone spark cluster manager that facilitates to install Spark on an empty set of machines. Role of Worker  Consists of Executors and tasks  Executes the tasks assigned by Cluster Manager. Role of Executor  Executor performs all the data processing.  Reads from and Writes data to external sources.  Executor stores the computation results data in-memory, cache or on hard disk drives.  Interacts with the storage systems.
  • 8. APACHE SPARK ARCHITECTURE OVERVIEW Fig.: Apache Spark Architecture
  • 10. WORKING OF APACHE SPARK ARCHITECTURE The client submits spark user application code. When an application code is submitted, the driver implicitly converts user code that contains transformations and actions into a logically directed acyclic graph called DAG. At this stage, it also performs optimizations such as pipelining transformations. After that, it converts the logical graph called DAG into physical execution plan with many stages. After converting into a physical execution plan, it creates physical execution units called tasks under each stage. Then the tasks are bundled and sent to the cluster. Now the driver talks to the cluster manager and negotiates the resources. Cluster manager launches executors in worker nodes on behalf of the driver. At this point, the driver will send the tasks to the executors based on data placement. When executors start, they register themselves with drivers. So, the driver will have a complete view of executors that are executing the task. During the course of execution of tasks, driver program will monitor the set of executors that runs. Driver node also schedules future tasks based on data placement.
  • 11. APPLICATIONS:  Healthcare  Banking  Stock Exchange  Machine Learning  Fog Computing  Uber  Pinterest
  • 12. CONCLUSION  Spark can run independently. Thus it gives flexibility. The architecture enumerates its ease of use, accessibility, and the ability to handle big data tasks. The architecture has finally come to dominate Hadoop mainly because of its speed. It finds usage in many industries. It has taken Hadoop MapReduce to a completely new level with few shuffles in the processing of data. The efficiency 100X of the system is enhanced by the in- memory data storage and real-time processing of data. The lazy evaluation contributes to the speed.