SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
Savanna -
Hadoop on
OpenStack
Mirantis, 2013Sergey Lukjanov
Savanna Technical Lead
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
● Open source native OpenStack component
● Supports different Hadoop distributions
● Solves both bare cluster provisioning use case
and "analytics as a service"
● Managed through REST API
● Web UI as part of the OpenStack Dashboard
● Flexible templates of Hadoop configurations
Savanna - Elastic Hadoop on OpenStack
● Project home - https://launchpad.net/savanna
○ bug tracking
○ blueprints
○ answers
● Code review (gerrit) - https://review.openstack.org
● Sources - https://github.com/stackforge/savanna
● Mailing list - savanna-all@lists.launchpad.net
● CI - https://jenkins.openstack.org and
http://jenkins.savanna.mirantis.com
Savanna - Elastic Hadoop on OpenStack
● Contributors:
○ large core team from Mirantis
○ teams from RedHat, Hortonworks
○ several minor contributors
● Intel joined recently
● Several upcoming customers
Savanna - Participants
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
● Administrators - centralized cluster management
and monitoring
● Dev and QA teams - fast clusters provisioning
● Data Scientists/Analysts - API to run the analytic
jobs with infrastructure provisioning happening
under the hood
● Making resources dedicated to IaaS cloud
available for Hadoop workload
Savanna Use Cases
● Central point of control over infrastructure
● Enables self-service capabilities, including choice
of Hadoop distribution to be used
● Integration with vendor tooling:
○ Ambari for Apache/HortonWorks
○ Cloudera Management Console
○ Intel Hadoop
● Utilization of free IaaS capacity for Hadoop tasks
Administrators Use Case
● Fast on-demand provisioning of the
environments
● Increase agility and speed of innovation
● Controlled access to data from production
Dev and QA Use Cases
● Simplified tasks execution - complexity of
provisioning and managing cluster hidden under
the hood
○ Access to higher level interfaces (e.g. pig, hive)
● Bursty workload: ad-hoc queries requiring a
significant resource only for short time period
● Utilization of free IaaS capacity for Hadoop tasks
Analytics Use Cases
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
Roadmap for Hadoop in Cloud
Phase 1
Basic cluster provisioning of Apache Hadoop
Phase 2
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Phase 1 - Basic Cluster Operation
● Cluster provisioning
● Deployment Engine implementation for pre-
installed images
● Templates for Hadoop cluster configuration
● REST API for cluster startup and operations
● Web UI integrated into OpenStack Dashboard
Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Phase 2 - Advanced Configuration
● Hadoop cluster configuration support:
○ Solutions for HDFS data reliability issue
○ Configurable DN storage location
○ Configurable topology of DN, NN, TT, JT
○ Add/remove nodes
○ More Hadoop parameters
● Integration with vendor
deployment/management tooling
● Basic monitoring support
Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2 [In progress - July 15]
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Phase 3 - Analytics as a Service
● API to execute Map/Reduce jobs without
exposing details of underlying infrastructure
(similar to AWS EMR)
● User-friendly UI for ad-hoc analytics queries
based on Hive or Pig
Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2 [In progress - July 15]
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3 [Planned - October 15]
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Further Roadmap
● Autoscaling
● HA for NameNode
● Deeper HDFS and Swift integration
○ Caching of Swift data on HDFS
● Integration with logging and error handling
● HBase support
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
Architecture Overview
Savanna
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
DAL
Nova
Glance
Swift
Savanna
Pages
Hadoop
VM
Provisioning
Plugin
Hadoop
VM
Hadoop
VM
Hadoop
VM
Instance
Interop Helper
Image
Registry
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
HDFS Reliability: the issue
Compute
DN DN
DN
DN DN
DN
Data Block
Compute
HDFS Reliability: the issue
Compute
DN DN
DN
DN DN
DN
Data Block
Compute
HDFS Reliability: the issue
Compute
DN DN
DN
DN DN
DN
Data Block
Compute
HDFS Reliability: single DN per host
DN
Compute
TT | DN
Compute
DN
Compute
DN
Cluster A Cluster B
HDFS Reliability: Hadoop-8468
hypervisor-awareness for HDFS scheduler
DN
Compute
DN DN
Compute
DN DN
Compute
DN
HDFS
Data Block
HDFS Reliability: Hadoop-8545
enables Swift for Hadoop
Swift
Hadoop
Job #1
HDFS
Hadoop
Job #2
...
Hadoop
Job #N
initial input
final output
● Master node(s)
● Worker nodes
Configurable topology of DN, NN, TT, JT
JT | NN JT NN+
TTTT | DN DN
10 6 8
HDFS Placement Options
● Ephemeral drive
/var/lib/nova/instances/instance-xxx/disk ->
/mnt/ephemeral
● Block storage volume
Cinder Volume -> /mnt/volume
● Bare hard drive support
/dev/sdb -> /mnt/sdb
Q&A
We are hiring!
Phase 1 deployment mechanism
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
Savanna
Provision VMs with
pre-installed Hadoop
Configure Hadoop
Cluster
Tool usage scenarios
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
Tool
Manage Hadoop Cluster
VMVM
VM VM
Tool
Provision &
Manage Hadoop Cluster
Scenario I
Scenario II
Extensible Provisioning
● get extra configs
● validate input
● launch/terminate cluster
● add/remove nodes
● launch/terminate VMs
● get VM status
● ssh/scp to VM
Instance Interop
● register image in
Savanna
● add/remove tags
● get image by tag
Image registry
Plugin
S
a
v
a
n
n
a
get extra parameters
add/remove nodes
Provisioning Interaction
launch cluster
launch cluster
get extra parameters
for the plugin
S
a
v
a
n
n
a
U
s
e
r
P
l
u
g
i
n
validate cluster
parameters
add/remove nodes
launch cluster
add/remove nodes
Provisioning: Launching a Cluster
launch VMs
P
L
U
G
I
N
Image
Registry
Instance
Interop
Helper
get image
by tag
launch VMs
install and
configure
Hadoop
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
pass
commands
via ssh, scp
Q&A
We are hiring!

Mais conteúdo relacionado

Mais procurados

Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
DataWorks Summit
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
Nicolas Poggi
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Summit
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
DataWorks Summit
 

Mais procurados (20)

Data Processing Updates - Juno Edition
Data Processing Updates - Juno EditionData Processing Updates - Juno Edition
Data Processing Updates - Juno Edition
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStack
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene Pang
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila final
 
Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
 

Destaque

WSO2 Quarterly Technical Update
WSO2 Quarterly Technical UpdateWSO2 Quarterly Technical Update
WSO2 Quarterly Technical Update
WSO2
 
New Products - Template and Roadmap Best Practices
New Products - Template and Roadmap Best PracticesNew Products - Template and Roadmap Best Practices
New Products - Template and Roadmap Best Practices
sarjanacoid
 
Reverse Engineering for exploit writers
Reverse Engineering for exploit writersReverse Engineering for exploit writers
Reverse Engineering for exploit writers
amiable_indian
 
Tesla roadster
Tesla roadsterTesla roadster
Tesla roadster
dmyers1
 

Destaque (20)

Open Data Center Alliance Workgroups, Usage Models and Roadmap Structure
Open Data Center Alliance Workgroups, Usage Models and Roadmap StructureOpen Data Center Alliance Workgroups, Usage Models and Roadmap Structure
Open Data Center Alliance Workgroups, Usage Models and Roadmap Structure
 
Product Release Road-map Guide
Product Release Road-map GuideProduct Release Road-map Guide
Product Release Road-map Guide
 
WSO2 Quarterly Technical Update
WSO2 Quarterly Technical UpdateWSO2 Quarterly Technical Update
WSO2 Quarterly Technical Update
 
Metalnox Product Overview
Metalnox Product OverviewMetalnox Product Overview
Metalnox Product Overview
 
Share point 2010 roadmap
Share point 2010 roadmapShare point 2010 roadmap
Share point 2010 roadmap
 
Roadmap for successful IT budgeting
Roadmap for successful IT budgetingRoadmap for successful IT budgeting
Roadmap for successful IT budgeting
 
Mobile ECM: Using the Nuxeo Platform from mobile devices
Mobile ECM: Using the Nuxeo Platform from mobile devicesMobile ECM: Using the Nuxeo Platform from mobile devices
Mobile ECM: Using the Nuxeo Platform from mobile devices
 
Technical roadmap 2015 - Nuxeo Tour 2014
Technical roadmap 2015 - Nuxeo Tour 2014Technical roadmap 2015 - Nuxeo Tour 2014
Technical roadmap 2015 - Nuxeo Tour 2014
 
Windows azure overview
Windows azure overviewWindows azure overview
Windows azure overview
 
Gemtalk Product Roadmap
Gemtalk Product RoadmapGemtalk Product Roadmap
Gemtalk Product Roadmap
 
Mr. Ravi Shankar Gopal | Roadmap for growth in nonwovens industry in india
Mr. Ravi Shankar Gopal |  Roadmap for  growth in nonwovens  industry  in indiaMr. Ravi Shankar Gopal |  Roadmap for  growth in nonwovens  industry  in india
Mr. Ravi Shankar Gopal | Roadmap for growth in nonwovens industry in india
 
Introduction to GreenTouch
Introduction to GreenTouchIntroduction to GreenTouch
Introduction to GreenTouch
 
New Products - Template and Roadmap Best Practices
New Products - Template and Roadmap Best PracticesNew Products - Template and Roadmap Best Practices
New Products - Template and Roadmap Best Practices
 
Reverse Engineering for exploit writers
Reverse Engineering for exploit writersReverse Engineering for exploit writers
Reverse Engineering for exploit writers
 
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
 
Asap roadmap
Asap roadmapAsap roadmap
Asap roadmap
 
Change Presented ad A Project Roadmap: Infographic Template
Change Presented ad A Project Roadmap: Infographic TemplateChange Presented ad A Project Roadmap: Infographic Template
Change Presented ad A Project Roadmap: Infographic Template
 
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
 
Tesla roadster
Tesla roadsterTesla roadster
Tesla roadster
 
Mapping the Experience: How to Plan a Career Roadmap
Mapping the Experience: How to Plan a Career Roadmap Mapping the Experience: How to Plan a Career Roadmap
Mapping the Experience: How to Plan a Career Roadmap
 

Semelhante a Savanna - Elastic Hadoop on OpenStack

What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
Sap integration with_j_boss_technologies
Sap integration with_j_boss_technologiesSap integration with_j_boss_technologies
Sap integration with_j_boss_technologies
Serge Pagop
 

Semelhante a Savanna - Elastic Hadoop on OpenStack (20)

Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
3-2-1 Action! Running OpenStack Shared File System Service in Production
3-2-1 Action! Running OpenStack Shared File System Service in Production3-2-1 Action! Running OpenStack Shared File System Service in Production
3-2-1 Action! Running OpenStack Shared File System Service in Production
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
 
Resume_VipinKP
Resume_VipinKPResume_VipinKP
Resume_VipinKP
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Sap integration with_j_boss_technologies
Sap integration with_j_boss_technologiesSap integration with_j_boss_technologies
Sap integration with_j_boss_technologies
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
TechTalkThai webinar SAP HANA
TechTalkThai webinar SAP HANATechTalkThai webinar SAP HANA
TechTalkThai webinar SAP HANA
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
 

Mais de Sergey Lukjanov (6)

[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
 
Courses: concurrency #2
Courses: concurrency #2Courses: concurrency #2
Courses: concurrency #2
 
Twitter Storm
Twitter StormTwitter Storm
Twitter Storm
 
Java Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalkJava Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalk
 
Java Bytecode techtalk
Java Bytecode techtalkJava Bytecode techtalk
Java Bytecode techtalk
 
Kotlin techtalk
Kotlin techtalkKotlin techtalk
Kotlin techtalk
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Savanna - Elastic Hadoop on OpenStack

  • 1. Savanna - Hadoop on OpenStack Mirantis, 2013Sergey Lukjanov Savanna Technical Lead
  • 2. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 3. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 4. ● Open source native OpenStack component ● Supports different Hadoop distributions ● Solves both bare cluster provisioning use case and "analytics as a service" ● Managed through REST API ● Web UI as part of the OpenStack Dashboard ● Flexible templates of Hadoop configurations Savanna - Elastic Hadoop on OpenStack
  • 5. ● Project home - https://launchpad.net/savanna ○ bug tracking ○ blueprints ○ answers ● Code review (gerrit) - https://review.openstack.org ● Sources - https://github.com/stackforge/savanna ● Mailing list - savanna-all@lists.launchpad.net ● CI - https://jenkins.openstack.org and http://jenkins.savanna.mirantis.com Savanna - Elastic Hadoop on OpenStack
  • 6. ● Contributors: ○ large core team from Mirantis ○ teams from RedHat, Hortonworks ○ several minor contributors ● Intel joined recently ● Several upcoming customers Savanna - Participants
  • 7. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 8. ● Administrators - centralized cluster management and monitoring ● Dev and QA teams - fast clusters provisioning ● Data Scientists/Analysts - API to run the analytic jobs with infrastructure provisioning happening under the hood ● Making resources dedicated to IaaS cloud available for Hadoop workload Savanna Use Cases
  • 9. ● Central point of control over infrastructure ● Enables self-service capabilities, including choice of Hadoop distribution to be used ● Integration with vendor tooling: ○ Ambari for Apache/HortonWorks ○ Cloudera Management Console ○ Intel Hadoop ● Utilization of free IaaS capacity for Hadoop tasks Administrators Use Case
  • 10. ● Fast on-demand provisioning of the environments ● Increase agility and speed of innovation ● Controlled access to data from production Dev and QA Use Cases
  • 11. ● Simplified tasks execution - complexity of provisioning and managing cluster hidden under the hood ○ Access to higher level interfaces (e.g. pig, hive) ● Bursty workload: ad-hoc queries requiring a significant resource only for short time period ● Utilization of free IaaS capacity for Hadoop tasks Analytics Use Cases
  • 12. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 13. Roadmap for Hadoop in Cloud Phase 1 Basic cluster provisioning of Apache Hadoop Phase 2 Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 14. Phase 1 - Basic Cluster Operation ● Cluster provisioning ● Deployment Engine implementation for pre- installed images ● Templates for Hadoop cluster configuration ● REST API for cluster startup and operations ● Web UI integrated into OpenStack Dashboard
  • 15. Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 16. Phase 2 - Advanced Configuration ● Hadoop cluster configuration support: ○ Solutions for HDFS data reliability issue ○ Configurable DN storage location ○ Configurable topology of DN, NN, TT, JT ○ Add/remove nodes ○ More Hadoop parameters ● Integration with vendor deployment/management tooling ● Basic monitoring support
  • 17. Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 [In progress - July 15] Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 18. Phase 3 - Analytics as a Service ● API to execute Map/Reduce jobs without exposing details of underlying infrastructure (similar to AWS EMR) ● User-friendly UI for ad-hoc analytics queries based on Hive or Pig
  • 19. Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 [In progress - July 15] Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 [Planned - October 15] "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 20. Further Roadmap ● Autoscaling ● HA for NameNode ● Deeper HDFS and Swift integration ○ Caching of Swift data on HDFS ● Integration with logging and error handling ● HBase support
  • 21. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 23. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 24. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 25. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 26. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 27. HDFS Reliability: the issue Compute DN DN DN DN DN DN Data Block Compute
  • 28. HDFS Reliability: the issue Compute DN DN DN DN DN DN Data Block Compute
  • 29. HDFS Reliability: the issue Compute DN DN DN DN DN DN Data Block Compute
  • 30. HDFS Reliability: single DN per host DN Compute TT | DN Compute DN Compute DN Cluster A Cluster B
  • 31. HDFS Reliability: Hadoop-8468 hypervisor-awareness for HDFS scheduler DN Compute DN DN Compute DN DN Compute DN HDFS Data Block
  • 32. HDFS Reliability: Hadoop-8545 enables Swift for Hadoop Swift Hadoop Job #1 HDFS Hadoop Job #2 ... Hadoop Job #N initial input final output
  • 33. ● Master node(s) ● Worker nodes Configurable topology of DN, NN, TT, JT JT | NN JT NN+ TTTT | DN DN 10 6 8
  • 34. HDFS Placement Options ● Ephemeral drive /var/lib/nova/instances/instance-xxx/disk -> /mnt/ephemeral ● Block storage volume Cinder Volume -> /mnt/volume ● Bare hard drive support /dev/sdb -> /mnt/sdb
  • 35. Q&A
  • 37. Phase 1 deployment mechanism Hadoop VM Hadoop VM Hadoop VM Hadoop VM Savanna Provision VMs with pre-installed Hadoop Configure Hadoop Cluster
  • 38. Tool usage scenarios Hadoop VM Hadoop VM Hadoop VM Hadoop VM Tool Manage Hadoop Cluster VMVM VM VM Tool Provision & Manage Hadoop Cluster Scenario I Scenario II
  • 39. Extensible Provisioning ● get extra configs ● validate input ● launch/terminate cluster ● add/remove nodes ● launch/terminate VMs ● get VM status ● ssh/scp to VM Instance Interop ● register image in Savanna ● add/remove tags ● get image by tag Image registry Plugin S a v a n n a
  • 40. get extra parameters add/remove nodes Provisioning Interaction launch cluster launch cluster get extra parameters for the plugin S a v a n n a U s e r P l u g i n validate cluster parameters add/remove nodes launch cluster add/remove nodes
  • 41. Provisioning: Launching a Cluster launch VMs P L U G I N Image Registry Instance Interop Helper get image by tag launch VMs install and configure Hadoop Hadoop VM Hadoop VM Hadoop VM Hadoop VM pass commands via ssh, scp
  • 42. Q&A