SlideShare uma empresa Scribd logo
1 de 23
Hadoop and OpenStack
Matthew Farrellee, @spinningmatt, Red Hat
Sumit Mohanty, @smohanty, Hortonworks
What is OpenStack?
OpenStack is
A cloud operating system that controls large
pools of compute, storage, and networking
resources throughout a datacenter, all
managed through a dashboard that gives
administrators control while empowering their
users to provision resources through a web
interface.
An ecosystem of projects
● Compute - Nova
● Networking - Neutron
● Object Storage - Swift
● Block Storage - Cinder
● Identity - Keystone
● Image Service - Glance
● Dashboard - Horizon
● Telemetry - Ceilometer
● Orchestration - Heat
● Data Processing - Sahara
Sahara is combining use cases
Trends
Hadoop
EC2
OpenStack
www.google.com/trends/explore#q=hadoop,ec2,openstack
EC2 beta Aug 25 2006 (http://aws.typepad.
com/aws/2006/08/amazon_ec2_beta.html)
Data analysis is hard
Data analysis is hard...
● Come up w/ a relevant question
○ The question you answer won’t be the question you
set out to ask
○ Mine: Can I predict doctor specialty from what
procedures they perform?
● Find the data
○ Tons, little consistency, unknown origin, hidden in
silos, horded
○ Data w/o a dictionary is worse than code w/o
Data analysis is hard...
● Data usability
○ Acceptable license? (Even for Gov’t sets)
■ Mine: Metadata copyrighted by AMA!
○ Private is often highly protected, no/narrow DMZ
● Explore and clean
○ Two of the oldest people in the medical profession
working with medicare
○ Stephen Glasser graduated in 1773
○ Cheryl Palma graduated in 1776
Data analysis is hard...
● You got some answer to a question you
approximately asked
● You must refine the question and process
● Repeat
This is hard enough without having to manage
tools and infrastructure!
Sahara’s goal
Make managing Hadoop+ infrastructure and
tools so simple that they get out of your way
Sahara provides
● Apache Hadoop cluster and workload
management
○ Cluster - construct and manage the lifecycle of a
Hadoop cluster
○ Workload - workflow for big data processing with
Hadoop (AWS EMR-like)
● Through a Python library, REST API, Web
UI, command line interface
Sahara’s architecture
Data
Sources
Sahara
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
Data
Access
Layer
Swift
Sahara
Pages
Hadoop
VM
Vendors
Plugins
Hadoop
VM
Hadoop
VM
Hadoop
VM
Resources
Orchestration
Manager
Job
Sources Job
Manager
Heat
Nova
Glance
Cinder
Neutron
Trove DB
Sahara Service
Sahara’s features
● Plugin mechanism - distro choice
● Cluster scaling - elasticity
● Swift integration - data storage
● Cinder integration - persistent HDFS
● Network management with Nova and Neutron
● Anti-affinity, separate services on physical hardware
● Data locality with Swift
● Repeatable cluster creation w/ template mechanism
● http://docs.openstack.
org/developer/sahara/userdoc/features.html
Storage considerations
● Swift
○ Input/output through Swift HCFS plugin
○ Intermediate data stored in HDFS on cluster
○ Locality when co-locating swift & nova-compute
● HDFS
○ Local (long lived cluster) and remote (copy in)
● HDFS backed by ephemeral disk or Cinder
○ Ephemeral - /var/lib/nova/instances on compute host
○ Cinder - persistent block devices attached to instances
Sahara’s plugin architecture
● This is important!
● It’s where Hadoop distribution vendors
integrate their management software
● It’s how users pick different software
versions
● Currently: Vanilla (reference impl. w/ Apache
versions), HDP (via Ambari), IDH (via Intel
Manager), and Spark (w/ minimal CDH)
HDP Plugin Overview
● Full support for all Sahara Functionality
● Nova and Neutron network
● Cluster Scaling
● Scale Up
● Swift Integration
● Cinder Support
● Data Locality
● EDP
● Apache Ambari REST API’s used for cluster
provisioning
● Monitoring/Management of clusters via Ambari
● Full support for multiple HDP stacks
● HDP pre-installed or generic VM images
HDP 1.3
● NameNode
● Secondary NameNode
● DataNode
● HDFS
● ZooKeeper
● Ambari Server/Agent
● HCatalog
● Sqoop
● Job Tracker
● Task Tracker
● MapReduce
● Hive
● MySQL
● Pig
● WebHCat Server
● Oozie
● Ganglia
● Nagios
● HBase
HDP Plugin Stack Support
HDP 2.0
● History Server
● MapReduce 2 / YARN
● Resource Manager
● YARN Client
HDP 2.1
● Storm
● Falcon
Com
ing Soon!
Available
Available
HDP 2.1 +
● SOLR
● Cascading
Roadm
ap
Ambari Blueprints
● Two primary goals of Ambari Blueprints
○ Ability to export a complete description of a running
cluster
○ Provide API based cluster installations based on a self-
contained cluster description
● Blueprints contain cluster topology and configuration
information
● Enables Interesting use cases between physical and virtual,
including OpenStack/Sahara
Blueprint API
BLUEPRINT
POST /blueprints/my-
blueprint
CLUSTER
INSTANCE
POST
/clusters/MyCluster
1
2
Example: Single-Node Definitions
{
"configurations" : [
{
”hdfs-site" : {
"dfs.namenode.name.dir" : ”/hadoop/nn"
}
}
],
"host_groups" : [
{
"name" : ”uber-host",
"components" : [
{ "name" : "NAMENODE” },
{ "name" : "SECONDARY_NAMENODE” },
{ "name" : "DATANODE” },
{ "name" : "HDFS_CLIENT” },
{ "name" : "RESOURCEMANAGER” },
{ "name" : "NODEMANAGER” },
{ "name" : "YARN_CLIENT” },
{ "name" : "HISTORYSERVER” },
{ "name" : "MAPREDUCE2_CLIENT” }
],
"cardinality" : "1"
}
],
"Blueprints" : {
"blueprint_name" : "single-node-hdfs-yarn",
"stack_name" : "HDP",
"stack_version" : "2.0"
}
}
{
"blueprint" : "single-node-hdfs-yarn",
"host_groups" :[
{
"name" : ”uber-host",
"hosts" : [
{
"fqdn" : "c6401.ambari.apache.org”
}
]
}
]
}
BLUEPRINT
CLUSTER INSTANCE
Description
• Single-node cluster
• Use HDP 2.0 Stack
• HDFS + YARN + MR2
• Everything on c6401
Demo - youtu.be/vmry_kXqn4c
● http://jayunit100.github.io/bigpetstore/slides
● Bigpetstore
o A full stack hadoop application
o Uses the main players in the hadoop ecosystem
o To demonstrate a single domain
o Just accepted into the Bigtop project!
● Come by the Red Hat booth - G18
Q&A
● Status - Integrated for Juno (Oct 2014)
● Distro - RDO (Fedora/RHEL/CentOS), RHEL
OSP 5, ...
● Home - https://launchpad.net/sahara
● Docs - http://docs.openstack.org/developer/sahara
● Code - https://github.com/openstack/ *sahara*
● Email - openstack-dev w/ [sahara]
● IRC - #openstack-sahara on freenode

Mais conteúdo relacionado

Mais procurados

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 

Mais procurados (20)

Streaming Data Pipelines With Apache Beam
Streaming Data Pipelines With Apache BeamStreaming Data Pipelines With Apache Beam
Streaming Data Pipelines With Apache Beam
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Open Policy Agent
Open Policy AgentOpen Policy Agent
Open Policy Agent
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streams
 
Apache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with SparkApache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with Spark
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStack
 
OpenStack Quantum Intro (OS Meetup 3-26-12)
OpenStack Quantum Intro (OS Meetup 3-26-12)OpenStack Quantum Intro (OS Meetup 3-26-12)
OpenStack Quantum Intro (OS Meetup 3-26-12)
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
Microservices Network Architecture 101
Microservices Network Architecture 101Microservices Network Architecture 101
Microservices Network Architecture 101
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 

Destaque

Top 10 implementation specialist interview questions and answers
Top 10 implementation specialist interview questions and answersTop 10 implementation specialist interview questions and answers
Top 10 implementation specialist interview questions and answers
jomdare
 
Web Application Optimization Techniques
Web Application Optimization TechniquesWeb Application Optimization Techniques
Web Application Optimization Techniques
takinbo
 
Training i-staad pro 2007
Training i-staad pro 2007Training i-staad pro 2007
Training i-staad pro 2007
fazil64
 

Destaque (14)

Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Asca Perception Data & Surveys
Asca Perception Data & SurveysAsca Perception Data & Surveys
Asca Perception Data & Surveys
 
IBM Endpoint Manager for Mobile Devices (Overview)
IBM Endpoint Manager for Mobile Devices (Overview)IBM Endpoint Manager for Mobile Devices (Overview)
IBM Endpoint Manager for Mobile Devices (Overview)
 
Top 10 implementation specialist interview questions and answers
Top 10 implementation specialist interview questions and answersTop 10 implementation specialist interview questions and answers
Top 10 implementation specialist interview questions and answers
 
Web Application Optimization Techniques
Web Application Optimization TechniquesWeb Application Optimization Techniques
Web Application Optimization Techniques
 
Mandibular nerve block and mental nerve / oral surgery courses
Mandibular nerve block and mental nerve / oral surgery courses  Mandibular nerve block and mental nerve / oral surgery courses
Mandibular nerve block and mental nerve / oral surgery courses
 
Training i-staad pro 2007
Training i-staad pro 2007Training i-staad pro 2007
Training i-staad pro 2007
 
Telecom Fraud Detection - Naive Bayes Classification
Telecom Fraud Detection - Naive Bayes ClassificationTelecom Fraud Detection - Naive Bayes Classification
Telecom Fraud Detection - Naive Bayes Classification
 
 Traumatic bone cyst
 Traumatic bone cyst Traumatic bone cyst
 Traumatic bone cyst
 
Strategic review (Sample)
Strategic review (Sample)Strategic review (Sample)
Strategic review (Sample)
 
SRAM Design
SRAM DesignSRAM Design
SRAM Design
 
Managed Print Services Presentation
Managed Print Services PresentationManaged Print Services Presentation
Managed Print Services Presentation
 

Semelhante a Hadoop and OpenStack

Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStackHong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Sergey Lukjanov
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Sergey Lukjanov
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
Rajesh Nadipalli
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
Amrut Patil
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginners
cpallares
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 

Semelhante a Hadoop and OpenStack (20)

Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStackHong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
 
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginners
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
Upcoming services in OpenStack
Upcoming services in OpenStackUpcoming services in OpenStack
Upcoming services in OpenStack
 
State of openstack industry: Why we are doing this
State of openstack industry: Why we are doing thisState of openstack industry: Why we are doing this
State of openstack industry: Why we are doing this
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
 

Mais de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 

Hadoop and OpenStack

  • 1. Hadoop and OpenStack Matthew Farrellee, @spinningmatt, Red Hat Sumit Mohanty, @smohanty, Hortonworks
  • 3. OpenStack is A cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, all managed through a dashboard that gives administrators control while empowering their users to provision resources through a web interface.
  • 4. An ecosystem of projects ● Compute - Nova ● Networking - Neutron ● Object Storage - Swift ● Block Storage - Cinder ● Identity - Keystone ● Image Service - Glance ● Dashboard - Horizon ● Telemetry - Ceilometer ● Orchestration - Heat ● Data Processing - Sahara
  • 6. Trends Hadoop EC2 OpenStack www.google.com/trends/explore#q=hadoop,ec2,openstack EC2 beta Aug 25 2006 (http://aws.typepad. com/aws/2006/08/amazon_ec2_beta.html)
  • 8. Data analysis is hard... ● Come up w/ a relevant question ○ The question you answer won’t be the question you set out to ask ○ Mine: Can I predict doctor specialty from what procedures they perform? ● Find the data ○ Tons, little consistency, unknown origin, hidden in silos, horded ○ Data w/o a dictionary is worse than code w/o
  • 9. Data analysis is hard... ● Data usability ○ Acceptable license? (Even for Gov’t sets) ■ Mine: Metadata copyrighted by AMA! ○ Private is often highly protected, no/narrow DMZ ● Explore and clean ○ Two of the oldest people in the medical profession working with medicare ○ Stephen Glasser graduated in 1773 ○ Cheryl Palma graduated in 1776
  • 10. Data analysis is hard... ● You got some answer to a question you approximately asked ● You must refine the question and process ● Repeat This is hard enough without having to manage tools and infrastructure!
  • 11. Sahara’s goal Make managing Hadoop+ infrastructure and tools so simple that they get out of your way
  • 12. Sahara provides ● Apache Hadoop cluster and workload management ○ Cluster - construct and manage the lifecycle of a Hadoop cluster ○ Workload - workflow for big data processing with Hadoop (AWS EMR-like) ● Through a Python library, REST API, Web UI, command line interface
  • 14. Sahara’s features ● Plugin mechanism - distro choice ● Cluster scaling - elasticity ● Swift integration - data storage ● Cinder integration - persistent HDFS ● Network management with Nova and Neutron ● Anti-affinity, separate services on physical hardware ● Data locality with Swift ● Repeatable cluster creation w/ template mechanism ● http://docs.openstack. org/developer/sahara/userdoc/features.html
  • 15. Storage considerations ● Swift ○ Input/output through Swift HCFS plugin ○ Intermediate data stored in HDFS on cluster ○ Locality when co-locating swift & nova-compute ● HDFS ○ Local (long lived cluster) and remote (copy in) ● HDFS backed by ephemeral disk or Cinder ○ Ephemeral - /var/lib/nova/instances on compute host ○ Cinder - persistent block devices attached to instances
  • 16. Sahara’s plugin architecture ● This is important! ● It’s where Hadoop distribution vendors integrate their management software ● It’s how users pick different software versions ● Currently: Vanilla (reference impl. w/ Apache versions), HDP (via Ambari), IDH (via Intel Manager), and Spark (w/ minimal CDH)
  • 17. HDP Plugin Overview ● Full support for all Sahara Functionality ● Nova and Neutron network ● Cluster Scaling ● Scale Up ● Swift Integration ● Cinder Support ● Data Locality ● EDP ● Apache Ambari REST API’s used for cluster provisioning ● Monitoring/Management of clusters via Ambari ● Full support for multiple HDP stacks ● HDP pre-installed or generic VM images
  • 18. HDP 1.3 ● NameNode ● Secondary NameNode ● DataNode ● HDFS ● ZooKeeper ● Ambari Server/Agent ● HCatalog ● Sqoop ● Job Tracker ● Task Tracker ● MapReduce ● Hive ● MySQL ● Pig ● WebHCat Server ● Oozie ● Ganglia ● Nagios ● HBase HDP Plugin Stack Support HDP 2.0 ● History Server ● MapReduce 2 / YARN ● Resource Manager ● YARN Client HDP 2.1 ● Storm ● Falcon Com ing Soon! Available Available HDP 2.1 + ● SOLR ● Cascading Roadm ap
  • 19. Ambari Blueprints ● Two primary goals of Ambari Blueprints ○ Ability to export a complete description of a running cluster ○ Provide API based cluster installations based on a self- contained cluster description ● Blueprints contain cluster topology and configuration information ● Enables Interesting use cases between physical and virtual, including OpenStack/Sahara
  • 21. Example: Single-Node Definitions { "configurations" : [ { ”hdfs-site" : { "dfs.namenode.name.dir" : ”/hadoop/nn" } } ], "host_groups" : [ { "name" : ”uber-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "SECONDARY_NAMENODE” }, { "name" : "DATANODE” }, { "name" : "HDFS_CLIENT” }, { "name" : "RESOURCEMANAGER” }, { "name" : "NODEMANAGER” }, { "name" : "YARN_CLIENT” }, { "name" : "HISTORYSERVER” }, { "name" : "MAPREDUCE2_CLIENT” } ], "cardinality" : "1" } ], "Blueprints" : { "blueprint_name" : "single-node-hdfs-yarn", "stack_name" : "HDP", "stack_version" : "2.0" } } { "blueprint" : "single-node-hdfs-yarn", "host_groups" :[ { "name" : ”uber-host", "hosts" : [ { "fqdn" : "c6401.ambari.apache.org” } ] } ] } BLUEPRINT CLUSTER INSTANCE Description • Single-node cluster • Use HDP 2.0 Stack • HDFS + YARN + MR2 • Everything on c6401
  • 22. Demo - youtu.be/vmry_kXqn4c ● http://jayunit100.github.io/bigpetstore/slides ● Bigpetstore o A full stack hadoop application o Uses the main players in the hadoop ecosystem o To demonstrate a single domain o Just accepted into the Bigtop project! ● Come by the Red Hat booth - G18
  • 23. Q&A ● Status - Integrated for Juno (Oct 2014) ● Distro - RDO (Fedora/RHEL/CentOS), RHEL OSP 5, ... ● Home - https://launchpad.net/sahara ● Docs - http://docs.openstack.org/developer/sahara ● Code - https://github.com/openstack/ *sahara* ● Email - openstack-dev w/ [sahara] ● IRC - #openstack-sahara on freenode