SlideShare uma empresa Scribd logo
1 de 20
SUBMITTED TO: SUBMITTED BY:
Mrs. Suman singh Nikita Vijay
(HOD of CSE Dept.) B. Tech –VIII sem(CSE)
A SEMINAR PRESENTATION ON
“Introduction To Apache Spark”
● Need of new generation distributed system
● Hardware/software evolution in last decade
● Apache Spark
• Components of Apache Spark
● Why Spark?
● Who are using Spark?
Agenda
● Lot has been changed from 2000
● Both hardware and software gone through changes
● Big data has become necessity now
● Let’s look at what changed over decade
Why we need new generation?
● Disk was cheap so disk was primary source of data
● Network was costly so data locality
● RAM was very costly
● Single core machines were dominant
RAM is the king
• RAM is primary source of data and we use disk for
fallback
● Network is speedier
● Multi core machines are commonplace
State of hardware in 2000
Now
● Object orientation was the king
● Software optimized for single core
● No open frameworks for creating
○ Distributed storage
○ Distributed processing
● SQL was the only dominant way for data analysis
Now
•Functional programming is on rise
● Software needs to exploit multiple cores on single node
There are good frameworks to create distributed systems
○ HDFS for storage
● NoSQL is real alternative now
Software in 2000
● Very few companies had big data issue
● Batch processing system ruled the world
● Volume was big concern compare to velocity
● Mostly used for
○ Search
○ Log analysis
● All companies use big data
● Velocity is as much concern as volume
Needs of real time are as much important as batch
processing
Big Data processing needs in 2000
NOW
• A fast and general engine for large scale data
processing
• Created by AMPLab
• Written in Scala
•Licensed under Apache
Apache Spark
Spark streaming
graphX
MLlib
Apache sql
Benefits of a Unified Platform
• No copying of data between systems
•Combine processing types in one program
• Code reuse
• One system to learn
• One system to maintain
Mesos, a distributed system framework as class project
in UC Berkeley in 2009.
● Spark to test how mesos works
● Focused on
○ Iterative programs (ML)
○ Unifying real time and batch processing
● Open sourced in 2010
History of Apache Spark
● You can spark on top any distributed system
● It can run on
○ Yarn
○ Apache Mesos
○ It’s own cluster
Runs everywhere
● Apache Spark is highly modular
The original version contained only 1600 lines of scala
code
● Apache Spark API is extremely simple compared Java
API of M/R
● API is concise and consistent
Small and Simple
Source : http://spark-summit.org/wp-content/uploads/2013/10/Zaharia-spark-summit-2013-
• In Spark, you can cache hdfs data in main memory of
worker nodes
• Spark analysis can be executed directly on in memory
data
● Shuffling also can be done from in memory
● Fault tolerant
In-memory aka Speed
● No separate storage layer
● Integrates well with HDFS
● Can run on Hadoop 1.0 and Hadoop 2.0 YARN
● Excellent integration with ecosystem projects like
Apache Hive, HBase etc
Integration with Hadoop
● Written in Scala but API is not limited to it
● Offers API in
○ Scala
○ Java
○ Python
● You can also do SQL using SparkSQL
Multi language API
Who are using Spark
seminar presentation on apache-spark
seminar presentation on apache-spark

Mais conteúdo relacionado

Mais procurados

Distributed computing
Distributed computingDistributed computing
Distributed computing
shivli0769
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
Databricks
 

Mais procurados (20)

Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low LatencyAggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Aneka platform
Aneka platformAneka platform
Aneka platform
 
Apache Spark.
Apache Spark.Apache Spark.
Apache Spark.
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Migration into a Cloud
Migration into a CloudMigration into a Cloud
Migration into a Cloud
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Storage Area Network (San)
Storage Area Network (San)Storage Area Network (San)
Storage Area Network (San)
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
 

Semelhante a seminar presentation on apache-spark

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 

Semelhante a seminar presentation on apache-spark (20)

Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Spark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark GroupSpark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark Group
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Getting Started with Spark Scala
Getting Started with Spark ScalaGetting Started with Spark Scala
Getting Started with Spark Scala
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and Kafka
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
 
BlackRay - The open Source Data Engine
BlackRay - The open Source Data EngineBlackRay - The open Source Data Engine
BlackRay - The open Source Data Engine
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 

Mais de Jawhar Ali

Mais de Jawhar Ali (20)

seminar report on What is ransomware
seminar report on What is ransomwareseminar report on What is ransomware
seminar report on What is ransomware
 
seminar report on Sql injection
seminar report on Sql injectionseminar report on Sql injection
seminar report on Sql injection
 
seminar report on kingapp application
seminar report on kingapp applicationseminar report on kingapp application
seminar report on kingapp application
 
seminar report on school management system
seminar report on school management systemseminar report on school management system
seminar report on school management system
 
seminar presentation on Face ricognition technology
seminar presentation on Face ricognition technologyseminar presentation on Face ricognition technology
seminar presentation on Face ricognition technology
 
seminar presentation on Digital Jwellery
seminar presentation on Digital Jwelleryseminar presentation on Digital Jwellery
seminar presentation on Digital Jwellery
 
powerpoint presentation on sixth sense Technology
powerpoint presentation  on sixth sense Technologypowerpoint presentation  on sixth sense Technology
powerpoint presentation on sixth sense Technology
 
Powerpoint presentation on 5G wireless technology
Powerpoint presentation on 5G wireless technologyPowerpoint presentation on 5G wireless technology
Powerpoint presentation on 5G wireless technology
 
powerpoint presentation on Google glass
powerpoint presentation on Google glasspowerpoint presentation on Google glass
powerpoint presentation on Google glass
 
Table Of Contents Google Glass
Table Of Contents Google GlassTable Of Contents Google Glass
Table Of Contents Google Glass
 
introduction and abstract on Google Glass Major report
introduction and abstract on  Google Glass Major reportintroduction and abstract on  Google Glass Major report
introduction and abstract on Google Glass Major report
 
Candidate declaration on Google Glass
Candidate declaration on Google GlassCandidate declaration on Google Glass
Candidate declaration on Google Glass
 
front Page on Google Glass
 front Page on Google Glass front Page on Google Glass
front Page on Google Glass
 
Table of contents on blood bank management system
Table of contents on blood bank management systemTable of contents on blood bank management system
Table of contents on blood bank management system
 
List of figures in Blood bank management system
List of figures in Blood bank management systemList of figures in Blood bank management system
List of figures in Blood bank management system
 
Full report on blood bank management system
Full report on  blood bank management systemFull report on  blood bank management system
Full report on blood bank management system
 
Cand declaration
Cand declaration Cand declaration
Cand declaration
 
Training report on web developing
Training report on web developingTraining report on web developing
Training report on web developing
 
seminar report on wireless Sensor network
seminar report on wireless Sensor networkseminar report on wireless Sensor network
seminar report on wireless Sensor network
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing ppt
 

Último

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 

seminar presentation on apache-spark

  • 1. SUBMITTED TO: SUBMITTED BY: Mrs. Suman singh Nikita Vijay (HOD of CSE Dept.) B. Tech –VIII sem(CSE) A SEMINAR PRESENTATION ON “Introduction To Apache Spark”
  • 2. ● Need of new generation distributed system ● Hardware/software evolution in last decade ● Apache Spark • Components of Apache Spark ● Why Spark? ● Who are using Spark? Agenda
  • 3. ● Lot has been changed from 2000 ● Both hardware and software gone through changes ● Big data has become necessity now ● Let’s look at what changed over decade Why we need new generation?
  • 4. ● Disk was cheap so disk was primary source of data ● Network was costly so data locality ● RAM was very costly ● Single core machines were dominant RAM is the king • RAM is primary source of data and we use disk for fallback ● Network is speedier ● Multi core machines are commonplace State of hardware in 2000 Now
  • 5. ● Object orientation was the king ● Software optimized for single core ● No open frameworks for creating ○ Distributed storage ○ Distributed processing ● SQL was the only dominant way for data analysis Now •Functional programming is on rise ● Software needs to exploit multiple cores on single node There are good frameworks to create distributed systems ○ HDFS for storage ● NoSQL is real alternative now Software in 2000
  • 6. ● Very few companies had big data issue ● Batch processing system ruled the world ● Volume was big concern compare to velocity ● Mostly used for ○ Search ○ Log analysis ● All companies use big data ● Velocity is as much concern as volume Needs of real time are as much important as batch processing Big Data processing needs in 2000 NOW
  • 7. • A fast and general engine for large scale data processing • Created by AMPLab • Written in Scala •Licensed under Apache Apache Spark
  • 9.
  • 10. Benefits of a Unified Platform • No copying of data between systems •Combine processing types in one program • Code reuse • One system to learn • One system to maintain
  • 11. Mesos, a distributed system framework as class project in UC Berkeley in 2009. ● Spark to test how mesos works ● Focused on ○ Iterative programs (ML) ○ Unifying real time and batch processing ● Open sourced in 2010 History of Apache Spark
  • 12. ● You can spark on top any distributed system ● It can run on ○ Yarn ○ Apache Mesos ○ It’s own cluster Runs everywhere
  • 13. ● Apache Spark is highly modular The original version contained only 1600 lines of scala code ● Apache Spark API is extremely simple compared Java API of M/R ● API is concise and consistent Small and Simple
  • 15. • In Spark, you can cache hdfs data in main memory of worker nodes • Spark analysis can be executed directly on in memory data ● Shuffling also can be done from in memory ● Fault tolerant In-memory aka Speed
  • 16. ● No separate storage layer ● Integrates well with HDFS ● Can run on Hadoop 1.0 and Hadoop 2.0 YARN ● Excellent integration with ecosystem projects like Apache Hive, HBase etc Integration with Hadoop
  • 17. ● Written in Scala but API is not limited to it ● Offers API in ○ Scala ○ Java ○ Python ● You can also do SQL using SparkSQL Multi language API
  • 18. Who are using Spark