SlideShare uma empresa Scribd logo
1 de 21
Leveraging For Cluster Computing 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
Robin M. E. Swezey 
Rakuten Institute of Technology, Tokyo 
Intelligence Domain Group 
robin.swezey@mail.rakuten.com
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
2 
What is Spark?
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
3 
In short, Spark is the future of 
open-source MapReduce
Why Spark? 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
4 
Current Hadoop stack is heterogeneous 
Spark = Fully integrated analytics suite and cluster 
computing framework 
Berkeley AMP lab + Apache Software Foundation 
Apache Hadoop, Hive, Mahout, Pig and their respective logos are trademarks of The Apache Software Foundation.
Platform 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
5 
On the surface, very similar to Hadoop 
• Relies on HDFS 
• Runs on Yarn, Mesos, or standalone 
• MapReduce + General cluster computing
Platform 
Key differences with usual stack 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
6 
1. Resilient Distributed Dataset (RDD) 
Central to Spark (R dataframe-ish) 
RDD 
RDD RDD
Platform 
Key differences with usual stack 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
7 
1. Resilient Distributed Dataset (RDD) 
Central to Spark (R dataframe-ish) 
RDD<String> 
RDD<Tuple> 
RDD<Tuple>
Key differences with usual stack 
Read blocks 
from disk 
Cache aggregates 
in memory 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
8 
Platform 
2. Better resource utilization 
Disk is slow. Memory is fast. Several levels of persistence.
Each node x each core 
/ each local block 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
9 
Platform 
Key differences with usual stack 
2. Better resource utilization 
More cores > more machines. Resource locality.
Key differences with usual stack 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
10 
Platform 
(Logistic Regression) 
3. Easier development & operations 
Scala, Java, Python API 
8 Lines
Key differences with usual stack 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
11 
Platform 
3. Easier Analytics 
Interactive Shells in Scala, Python 
Easy to connect with SparkContext (e.g. iPython Notebook)
Key differences with usual stack 
Easy MapReduce 
Platform 
DBMS-like Functionality 
Machine Learning 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
12 
Streaming 
4. Integrated Solution 
Apache Hadoop, Hive, Mahout, Pig and their respective logos are trademarks of The Apache Software Foundation.
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
13 
Applications 
Thanks to RDD Distributed Operators 
map() 
reduce() 
reduceByKey() 
groupBy() 
sample() 
pipe() 
foreach() 
fold() 
histogram() 
… 
Easy MapReduce
Sped-up Analytics with DBMS-like SQL Functionality 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
14 
Applications 
Cf. 
Integrated Unified Data Access 
Hive Compatible Standard Connectivity
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
15 
Applications 
Cf. 
Streaming
Machine Learning 
Statistics 
Classification / Regression 
Collaborative Filtering 
Clustering 
Dimensionality Reduction 
Feature Extraction 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
16 
Applications 
Cf. 
Image: http://en.wikipedia.org/wiki/Machine_learning#mediaviewer/File:Linear-svm-scatterplot.svg
Graph Processing 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
17 
Applications 
Flexible 
Fast 
PageRank 
Connected components 
Label propagation 
SVD++ 
Triangle count
How does it scale? 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
18 
More 
There are deployed clusters 
of 1,000+ nodes
There’s open-source, and there’s highly supported open-source 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
19 
More 
Spark 1.1.0 
had 171 contributors!
Image: http://upload.wikimedia.org/wikipedia/commons/thumb/9/94/Gartner_Hype_Cycle.svg/2000px-Gartner_Hype_Cycle.svg.png 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
20 
In Conclusion 
Hadoop Cluster Computing Hype Cycle
http://spark.apache.org 
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. 
All Spark images and code are from http://spark.apache.org/ 
21 
Thank you!

Mais conteúdo relacionado

Mais procurados

RDF_API_Java_Stefan_Apostoaie
RDF_API_Java_Stefan_ApostoaieRDF_API_Java_Stefan_Apostoaie
RDF_API_Java_Stefan_Apostoaieiosstef
 
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...Chris Fregly
 
Apache Spark An Overview
Apache Spark An OverviewApache Spark An Overview
Apache Spark An OverviewMohit Jain
 
Spark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of DatabricksSpark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of DatabricksData Con LA
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark Summit
 
JSON support in Java EE 8
JSON support in Java EE 8JSON support in Java EE 8
JSON support in Java EE 8Lukas Jungmann
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark Aakashdata
 
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Neil Andrassy
 
Apache spark
Apache spark Apache spark
Apache spark Edureka!
 
Apache Spark Fundamentals Meetup Talk
Apache Spark Fundamentals Meetup TalkApache Spark Fundamentals Meetup Talk
Apache Spark Fundamentals Meetup TalkEren Avşaroğulları
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsTakuya UESHIN
 
Functional programming with_jdk8-s_ritter
Functional programming with_jdk8-s_ritterFunctional programming with_jdk8-s_ritter
Functional programming with_jdk8-s_ritterSimon Ritter
 
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame APIPerformant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame APIRyuji Tamagawa
 
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...Chris Fregly
 
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...Durga Gadiraju
 

Mais procurados (16)

RDF_API_Java_Stefan_Apostoaie
RDF_API_Java_Stefan_ApostoaieRDF_API_Java_Stefan_Apostoaie
RDF_API_Java_Stefan_Apostoaie
 
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
 
Apache Spark An Overview
Apache Spark An OverviewApache Spark An Overview
Apache Spark An Overview
 
Spark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of DatabricksSpark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of Databricks
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
 
JSON support in Java EE 8
JSON support in Java EE 8JSON support in Java EE 8
JSON support in Java EE 8
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
 
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
 
Apache spark
Apache spark Apache spark
Apache spark
 
Apache Spark Fundamentals Meetup Talk
Apache Spark Fundamentals Meetup TalkApache Spark Fundamentals Meetup Talk
Apache Spark Fundamentals Meetup Talk
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
 
Functional programming with_jdk8-s_ritter
Functional programming with_jdk8-s_ritterFunctional programming with_jdk8-s_ritter
Functional programming with_jdk8-s_ritter
 
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame APIPerformant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame API
 
Spark
SparkSpark
Spark
 
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
 
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...
 

Semelhante a [Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster Computing

5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!Edureka!
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Apache spark installation [autosaved]
Apache spark installation [autosaved]Apache spark installation [autosaved]
Apache spark installation [autosaved]Shweta Patnaik
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Edureka!
 
Spark introduction & Architecture.pptx
Spark introduction & Architecture.pptxSpark introduction & Architecture.pptx
Spark introduction & Architecture.pptxMUMERSHARJEELCh
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Edureka!
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache SparkEdureka!
 
Infra space talk on Apache Spark - Into to CASK
Infra space talk on Apache Spark - Into to CASKInfra space talk on Apache Spark - Into to CASK
Infra space talk on Apache Spark - Into to CASKRob Mueller
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!Edureka!
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirLuciano Resende
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Simplilearn
 
Running R at Scale with Apache Arrow on Spark
Running R at Scale with Apache Arrow on SparkRunning R at Scale with Apache Arrow on Spark
Running R at Scale with Apache Arrow on SparkDatabricks
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...Edureka!
 

Semelhante a [Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster Computing (20)

5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Apache spark
Apache sparkApache spark
Apache spark
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Apache spark installation [autosaved]
Apache spark installation [autosaved]Apache spark installation [autosaved]
Apache spark installation [autosaved]
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
 
Spark introduction & Architecture.pptx
Spark introduction & Architecture.pptxSpark introduction & Architecture.pptx
Spark introduction & Architecture.pptx
 
What is Spark
What is SparkWhat is Spark
What is Spark
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache Spark
 
Infra space talk on Apache Spark - Into to CASK
Infra space talk on Apache Spark - Into to CASKInfra space talk on Apache Spark - Into to CASK
Infra space talk on Apache Spark - Into to CASK
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
Running R at Scale with Apache Arrow on Spark
Running R at Scale with Apache Arrow on SparkRunning R at Scale with Apache Arrow on Spark
Running R at Scale with Apache Arrow on Spark
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
Module01
 Module01 Module01
Module01
 
Spark_Part 1
Spark_Part 1Spark_Part 1
Spark_Part 1
 

Mais de Rakuten Group, Inc.

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話Rakuten Group, Inc.
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のりRakuten Group, Inc.
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Rakuten Group, Inc.
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みRakuten Group, Inc.
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開Rakuten Group, Inc.
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用Rakuten Group, Inc.
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャーRakuten Group, Inc.
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割Rakuten Group, Inc.
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Group, Inc.
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfRakuten Group, Inc.
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfRakuten Group, Inc.
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfRakuten Group, Inc.
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technologyRakuten Group, Inc.
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情Rakuten Group, Inc.
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャーRakuten Group, Inc.
 

Mais de Rakuten Group, Inc. (20)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdf
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 

Último

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

[Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster Computing

  • 1. Leveraging For Cluster Computing Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ Robin M. E. Swezey Rakuten Institute of Technology, Tokyo Intelligence Domain Group robin.swezey@mail.rakuten.com
  • 2. Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 2 What is Spark?
  • 3. Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 3 In short, Spark is the future of open-source MapReduce
  • 4. Why Spark? Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 4 Current Hadoop stack is heterogeneous Spark = Fully integrated analytics suite and cluster computing framework Berkeley AMP lab + Apache Software Foundation Apache Hadoop, Hive, Mahout, Pig and their respective logos are trademarks of The Apache Software Foundation.
  • 5. Platform Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 5 On the surface, very similar to Hadoop • Relies on HDFS • Runs on Yarn, Mesos, or standalone • MapReduce + General cluster computing
  • 6. Platform Key differences with usual stack Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 6 1. Resilient Distributed Dataset (RDD) Central to Spark (R dataframe-ish) RDD RDD RDD
  • 7. Platform Key differences with usual stack Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 7 1. Resilient Distributed Dataset (RDD) Central to Spark (R dataframe-ish) RDD<String> RDD<Tuple> RDD<Tuple>
  • 8. Key differences with usual stack Read blocks from disk Cache aggregates in memory Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 8 Platform 2. Better resource utilization Disk is slow. Memory is fast. Several levels of persistence.
  • 9. Each node x each core / each local block Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 9 Platform Key differences with usual stack 2. Better resource utilization More cores > more machines. Resource locality.
  • 10. Key differences with usual stack Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 10 Platform (Logistic Regression) 3. Easier development & operations Scala, Java, Python API 8 Lines
  • 11. Key differences with usual stack Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 11 Platform 3. Easier Analytics Interactive Shells in Scala, Python Easy to connect with SparkContext (e.g. iPython Notebook)
  • 12. Key differences with usual stack Easy MapReduce Platform DBMS-like Functionality Machine Learning Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 12 Streaming 4. Integrated Solution Apache Hadoop, Hive, Mahout, Pig and their respective logos are trademarks of The Apache Software Foundation.
  • 13. Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 13 Applications Thanks to RDD Distributed Operators map() reduce() reduceByKey() groupBy() sample() pipe() foreach() fold() histogram() … Easy MapReduce
  • 14. Sped-up Analytics with DBMS-like SQL Functionality Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 14 Applications Cf. Integrated Unified Data Access Hive Compatible Standard Connectivity
  • 15. Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 15 Applications Cf. Streaming
  • 16. Machine Learning Statistics Classification / Regression Collaborative Filtering Clustering Dimensionality Reduction Feature Extraction Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 16 Applications Cf. Image: http://en.wikipedia.org/wiki/Machine_learning#mediaviewer/File:Linear-svm-scatterplot.svg
  • 17. Graph Processing Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 17 Applications Flexible Fast PageRank Connected components Label propagation SVD++ Triangle count
  • 18. How does it scale? Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 18 More There are deployed clusters of 1,000+ nodes
  • 19. There’s open-source, and there’s highly supported open-source Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 19 More Spark 1.1.0 had 171 contributors!
  • 20. Image: http://upload.wikimedia.org/wikipedia/commons/thumb/9/94/Gartner_Hype_Cycle.svg/2000px-Gartner_Hype_Cycle.svg.png Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 20 In Conclusion Hadoop Cluster Computing Hype Cycle
  • 21. http://spark.apache.org Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. All Spark images and code are from http://spark.apache.org/ 21 Thank you!