SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
The State of OpenStack
Data Processing: Sahara,
Now and in Juno
Sergey Lukjanov (Mirantis)
Matthew Farrellee (Red Hat)
John Speidel (Hortonworks)
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
OpenStack Data Processing: Sahara
Mission: To provide a scalable data processing
stack and associated management interfaces.
• provision and operate Hadoop clusters
• schedule and operate Hadoop jobs
Hadoop - Big Data Platform
© http://hortonworks.com/hadoop/yarn/
Trends
http://www.google.com/trends/
Use cases
• Self-service provisioning of Hadoop clusters
• Utilization of unused compute capacity for
bursty workloads
• Dev -> Stage -> Prod lifecycle
• Run Hadoop workloads in few clicks without
expertise in Hadoop ops
Architecture overview
Data
Sources
Savanna
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
Data
Access
Layer
Swift
Savanna
Pages
Hadoop
VM
Vendors
Plugins
Hadoop
VM
Hadoop
VM
Hadoop
VM
Resources
Orchestration
Manager
Job
Sources Job
Manager
Heat
Nova
Glance
Cinder
Neutron
Trove DB
Sahara status
• Official integrated OpenStack project
• Supported Hadoop distros:
• Vanilla Apache Hadoop
• Hortonworks Data Platform
• Intel Distribution
• Cloudera Distribution in blueprint
• Included into OpenStack distros:
• RDO - openstack.redhat.com
• Mirantis OpenStack - software.mirantis.com
Contributors
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
Icehouse release
142 bugs fixed
Icehouse release
57 blueprints
Icehouse release
32 people
Icehouse release
Standard process
Icehouse release
Dozens more
in the client!
Icehouse release
Tempest helps us manage our API
Icehouse release
Sahara easily deployed with DevStack
Icehouse release
Hadoop 2 available via all plugins
© http://hortonworks.com/hadoop/yarn/
Icehouse release
• HBase (and Sqoop) available via HDP plugin
• Spark images w/ diskimage-builder (full plugin in review)
• Heat for provisioning
• i18n translation started
• Neutron namespaces w/ rootwrap
• Guest agent implementation started
Elastic Data Processing (EDP) is Sahara’s take on
data processing workflow management.
Goal - let end users (those w/ high value questions
to answer) get answers about data without having
to know a single thing about cluster management.
“Customers launch millions of Amazon EMR clusters every year.”
http://aws.amazon.com/elasticmapreduce/
Elastic Data Processing update
Elastic Data Processing update
Available with the Hortonworks Data
Platform plugin
Elastic Data Processing update
Support for
external HDFS
Elastic Data Processing update
MapReduce.Streaming
and Java actions
Elastic Data Processing update
Job relaunch, with new data and parameters
Command line interface overview
If you can do it with the Dashboard, you
can do it from the command-line
Blueprint: python-savannaclient-cli
Command line interface overview
Image management
$ sahara
...
Positional arguments:
<subcommand>
image-add-tag Add a tag to an image.
image-list Print a list of available images.
image-register Register an image from the Image index.
image-remove-tag Remove a tag from an image.
image-show Show details of an image.
image-unregister Unregister an image.
Command line interface overview
Node group, cluster and job templates
$ sahara
node-group-template-create Create a node group...
node-group-template-delete Delete a node group...
node-group-template-list Print a list of available...
node-group-template-show Show details of a node...
cluster-template-create Create a cluster template.
cluster-template-delete Delete a cluster template.
cluster-template-list Print a list of available...
cluster-template-show Show details of a cluster...
job-template-create Create a job template.
job-template-delete Delete a job template.
job-template-list Print a list of job...
job-template-show Show details of a job...
Command line interface overview
Data sources and job binaries
$ sahara
...
<subcommand>
data-source-create Create a data source that provides
job input receives job output.
data-source-delete Delete a data source.
data-source-list Print a list of available data...
data-source-show Show details of a data source.
job-binary-create Record a job binary.
job-binary-delete Delete a job binary.
job-binary-list Print a list of job binaries.
job-binary-show Show details of a job binary.
Command line interface overview
Clusters and jobs
$ sahara
...
<subcommand>
cluster-create Create a cluster.
cluster-delete Delete a cluster.
cluster-list Print a list of available clusters.
cluster-show Show details of a cluster.
job-create
job-delete Delete a job.
job-list Print a list of jobs.
job-show Show details of a job.
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
HDP Plugin Overview
• Full support for all Sahara Functionality
• Nova and Neutron network
• Cluster Scaling
• Scale Up
• Swift Integration
• Cinder Support
• Data Locality
• EDP
• Apache Ambari REST API’s used for cluster
provisioning
• Monitoring/Management of clusters via Ambari
• Full support for multiple HDP stacks
• HDP pre-installed or generic VM images
HDP 1.3.2
● NameNode
● Secondary NameNode
● DataNode
● HDFS
● ZooKeeper
● Ambari Server/Agent
● HCatalog
● Sqoop
● Job Tracker
● Task Tracker
● MapReduce
● Hive
● MySQL
● Pig
● WebHCat Server
● Oozie
● Ganglia
● Nagios
● HBase
HDP Plugin Stack Support
HDP 2.0.6
● History Server
● MapReduce 2 / YARN
● Resource Manager
● YARN Client
HDP 2.1
● Storm
● Falcon
C
om
ing
Soon!
A
vailable
A
vailable
HDP 2.1 +
● SOLR
● Cascading
R
oadm
ap
HDP Disk Images
• Disk Image Builder offers consistent approach for image creation
• HDP Plugin provides images and scripts for (CentOS, RHEL):
• Plain
• 1.3.2
• 2.0.6
• 2.1 (coming soon)
• Pre-Packaged images (1.3.2, 2.0.6) provide images with HDP packages pre-
installed for accelerated provisioning, reduced network traffic
• Image Build Scripts allow images to be customized
• Security
• Custom Packages
• O/S Settings
Ambari Blueprints
• Two primary goals of Ambari Blueprints
• Ability to export a complete description of a
running cluster
• Provide API based cluster installations based on
a self- contained cluster description
• Blueprints contain cluster topology and configuration
information
• Enables Interesting use cases between physical and
virtual, including OpenStack/Sahara
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
Juno roadmap
• Further integration with OpenStack ecosystem:
• Distributed architecture
• Guest agents
• EDP enhancements
• Merge dashboard to Horizon
To be discussed and confirmed at Design Summit
Design Summit Sessions
7 Sessions: Thursday 1:30 - Friday 10:30
http://goo.gl/lQXtUS
Agenda
Q&A
Cluster and EDP workflows
Rarely
Infrequently
Occasionally
Commonly
Occasionally
Frequently

Mais conteúdo relacionado

Mais procurados

Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Sergey Lukjanov
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
DataWorks Summit
 
Savanna project update Jan 2014
Savanna project update Jan 2014Savanna project update Jan 2014
Savanna project update Jan 2014
Sergey Lukjanov
 

Mais procurados (20)

Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014
 
Benchmarking sahara based big data as a service solutions
Benchmarking sahara based big data as a service solutionsBenchmarking sahara based big data as a service solutions
Benchmarking sahara based big data as a service solutions
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
 
Savanna project update Jan 2014
Savanna project update Jan 2014Savanna project update Jan 2014
Savanna project update Jan 2014
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila final
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 
Hive on kafka
Hive on kafkaHive on kafka
Hive on kafka
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
 
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On SparkOctober 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
 

Semelhante a The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
Rajesh Nadipalli
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
tsliwowicz
 

Semelhante a The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta (20)

Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Sahara Updates - Liberty Edition
Sahara Updates - Liberty EditionSahara Updates - Liberty Edition
Sahara Updates - Liberty Edition
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App Engine
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its Components
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBig Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
 
Data Processing Updates - Juno Edition
Data Processing Updates - Juno EditionData Processing Updates - Juno Edition
Data Processing Updates - Juno Edition
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Distro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDistro-independent Hadoop cluster management
Distro-independent Hadoop cluster management
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Hadoop Platform at Yahoo
Hadoop Platform at YahooHadoop Platform at Yahoo
Hadoop Platform at Yahoo
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 

Último

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 

Último (20)

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 

The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

  • 1. The State of OpenStack Data Processing: Sahara, Now and in Juno Sergey Lukjanov (Mirantis) Matthew Farrellee (Red Hat) John Speidel (Hortonworks)
  • 2. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  • 3. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  • 4. OpenStack Data Processing: Sahara Mission: To provide a scalable data processing stack and associated management interfaces. • provision and operate Hadoop clusters • schedule and operate Hadoop jobs
  • 5. Hadoop - Big Data Platform © http://hortonworks.com/hadoop/yarn/
  • 7. Use cases • Self-service provisioning of Hadoop clusters • Utilization of unused compute capacity for bursty workloads • Dev -> Stage -> Prod lifecycle • Run Hadoop workloads in few clicks without expertise in Hadoop ops
  • 9. Sahara status • Official integrated OpenStack project • Supported Hadoop distros: • Vanilla Apache Hadoop • Hortonworks Data Platform • Intel Distribution • Cloudera Distribution in blueprint • Included into OpenStack distros: • RDO - openstack.redhat.com • Mirantis OpenStack - software.mirantis.com
  • 11. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  • 17. Icehouse release Tempest helps us manage our API
  • 18. Icehouse release Sahara easily deployed with DevStack
  • 19. Icehouse release Hadoop 2 available via all plugins © http://hortonworks.com/hadoop/yarn/
  • 20. Icehouse release • HBase (and Sqoop) available via HDP plugin • Spark images w/ diskimage-builder (full plugin in review) • Heat for provisioning • i18n translation started • Neutron namespaces w/ rootwrap • Guest agent implementation started
  • 21. Elastic Data Processing (EDP) is Sahara’s take on data processing workflow management. Goal - let end users (those w/ high value questions to answer) get answers about data without having to know a single thing about cluster management. “Customers launch millions of Amazon EMR clusters every year.” http://aws.amazon.com/elasticmapreduce/ Elastic Data Processing update
  • 22. Elastic Data Processing update Available with the Hortonworks Data Platform plugin
  • 23. Elastic Data Processing update Support for external HDFS
  • 24. Elastic Data Processing update MapReduce.Streaming and Java actions
  • 25. Elastic Data Processing update Job relaunch, with new data and parameters
  • 26. Command line interface overview If you can do it with the Dashboard, you can do it from the command-line Blueprint: python-savannaclient-cli
  • 27. Command line interface overview Image management $ sahara ... Positional arguments: <subcommand> image-add-tag Add a tag to an image. image-list Print a list of available images. image-register Register an image from the Image index. image-remove-tag Remove a tag from an image. image-show Show details of an image. image-unregister Unregister an image.
  • 28. Command line interface overview Node group, cluster and job templates $ sahara node-group-template-create Create a node group... node-group-template-delete Delete a node group... node-group-template-list Print a list of available... node-group-template-show Show details of a node... cluster-template-create Create a cluster template. cluster-template-delete Delete a cluster template. cluster-template-list Print a list of available... cluster-template-show Show details of a cluster... job-template-create Create a job template. job-template-delete Delete a job template. job-template-list Print a list of job... job-template-show Show details of a job...
  • 29. Command line interface overview Data sources and job binaries $ sahara ... <subcommand> data-source-create Create a data source that provides job input receives job output. data-source-delete Delete a data source. data-source-list Print a list of available data... data-source-show Show details of a data source. job-binary-create Record a job binary. job-binary-delete Delete a job binary. job-binary-list Print a list of job binaries. job-binary-show Show details of a job binary.
  • 30. Command line interface overview Clusters and jobs $ sahara ... <subcommand> cluster-create Create a cluster. cluster-delete Delete a cluster. cluster-list Print a list of available clusters. cluster-show Show details of a cluster. job-create job-delete Delete a job. job-list Print a list of jobs. job-show Show details of a job.
  • 31. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  • 32. HDP Plugin Overview • Full support for all Sahara Functionality • Nova and Neutron network • Cluster Scaling • Scale Up • Swift Integration • Cinder Support • Data Locality • EDP • Apache Ambari REST API’s used for cluster provisioning • Monitoring/Management of clusters via Ambari • Full support for multiple HDP stacks • HDP pre-installed or generic VM images
  • 33. HDP 1.3.2 ● NameNode ● Secondary NameNode ● DataNode ● HDFS ● ZooKeeper ● Ambari Server/Agent ● HCatalog ● Sqoop ● Job Tracker ● Task Tracker ● MapReduce ● Hive ● MySQL ● Pig ● WebHCat Server ● Oozie ● Ganglia ● Nagios ● HBase HDP Plugin Stack Support HDP 2.0.6 ● History Server ● MapReduce 2 / YARN ● Resource Manager ● YARN Client HDP 2.1 ● Storm ● Falcon C om ing Soon! A vailable A vailable HDP 2.1 + ● SOLR ● Cascading R oadm ap
  • 34. HDP Disk Images • Disk Image Builder offers consistent approach for image creation • HDP Plugin provides images and scripts for (CentOS, RHEL): • Plain • 1.3.2 • 2.0.6 • 2.1 (coming soon) • Pre-Packaged images (1.3.2, 2.0.6) provide images with HDP packages pre- installed for accelerated provisioning, reduced network traffic • Image Build Scripts allow images to be customized • Security • Custom Packages • O/S Settings
  • 35. Ambari Blueprints • Two primary goals of Ambari Blueprints • Ability to export a complete description of a running cluster • Provide API based cluster installations based on a self- contained cluster description • Blueprints contain cluster topology and configuration information • Enables Interesting use cases between physical and virtual, including OpenStack/Sahara
  • 36. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  • 37. Juno roadmap • Further integration with OpenStack ecosystem: • Distributed architecture • Guest agents • EDP enhancements • Merge dashboard to Horizon To be discussed and confirmed at Design Summit
  • 38. Design Summit Sessions 7 Sessions: Thursday 1:30 - Friday 10:30 http://goo.gl/lQXtUS
  • 40. Cluster and EDP workflows Rarely Infrequently Occasionally Commonly Occasionally Frequently