SlideShare a Scribd company logo
1 of 32
Download to read offline
Big data processing with
Hadoop on OpenStack
Matthew Farrellee
(@spinningmatt)
Red Hat
Here for a talk about Savanna?
Oops, this talk is about Sahara.
Good news is they’re the same thing.
Savanna was renamed for trademark reasons to Sahara.
You have to go to page 10 of google results to find out why:
https://www.google.com/search?q=savanna+hadoop&start=90
In brief - what is Hadoop
● Narrow - Apache Hadoop - a specific
Apache project originally from Yahoo!, based
on papers published from Google
● Broad - an ecosystem of projects, mostly
Apache, that integrate in some way with
Apache Hadoop
● Most common to use the broad definition
Hadoop from Hortonworks (+ others)
● Multiple projects
○ Workload management
○ Resource management
○ System management
○ Data ingest & storage
○ Compute frameworks
○ Domain languages
● Data storage and
processing focused
In brief - what is OpenStack
OpenStack is a cloud operating system that
controls large pools of compute, storage, and
networking resources throughout a datacenter,
all managed through a dashboard that gives
administrators control while empowering their
users to provision resources through a web
interface.
An ecosystem of projects
● Compute - Nova
● Networking - Neutron
● Object Storage - Swift
● Block Storage - Cinder
● Identity - Keystone
● Image Service - Glance
● Dashboard - Horizon
● Telemetry - Ceilometer
● Orchestration - Heat
● Data Processing - Sahara
Longer comments on big data
Choose your own adventure…
Go to the next slide and get the day over
sooner
See some shoegazing followed by a rant and
have the day last longer
Interest (via Google Trends)
Hadoop
EC2
OpenStack
www.google.com/trends/explore#q=hadoop,ec2,openstack
Interest (via Google Trends)
Hadoop
EC2
OpenStack
www.google.com/trends/explore#q=hadoop,ec2,openstack
EC2 beta Aug 25 2006 (http://aws.typepad.
com/aws/2006/08/amazon_ec2_beta.html)
Data analysis is hard
Analysis - have a question
● Even this alone is hard to come up with
● The question you answer won’t be the
question you set out to ask
● You’ll have to iterate and refine
Can I predict doctor specialty from what
procedures they perform?
Analysis - finding the data
● Publically -
○ Tons of data repositories
○ No consistency, even within a specific repository
● Privately -
○ Data often hidden in silos
○ Even less consistency
● Avoid datasets that don’t come with a
dictionary
○ Data w/o a dictionary is like code w/o comments
Analysis - acceptable use
● Publically -
○ Data sets often have associated licenses
○ Yes, even public (government) sets
○ You may have to find an alternative set
● Privately -
○ Often tightly controlled, considered sensitive
business data
○ If you can use it, maybe only in a specific place
○ Likely no alternatives
● The story of Stephen Glasser and Cheryl
Palma
● Two of the oldest people in the medical
profession working with medicare
● Stephen Glasser graduated in 1773
● Cheryl Palma graduated in 1776
Analysis - explore / clean the data
Analysis - finally
● You got some answer to a question you
approximately asked
● You must refine the question and process
● Repeat
This is hard enough without having to manage
tools and infrastructure!
Sahara’s goal
Make managing Hadoop+ infrastructure and
tools so simple that doing so never gets in your
way
Sahara is
● An OpenStack project in the Data
Processing program
● Started one year ago (Summit in Portland)
● Incubated in Icehouse (6 months ago)
● Integrated for Juno (6 months from now)
Sahara’s architecture
Data
Sources
Sahara
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
Data
Access
Layer
Swift
Sahara
Pages
Hadoop
VM
Vendors
Plugins
Hadoop
VM
Hadoop
VM
Hadoop
VM
Resources
Orchestration
Manager
Job
Sources Job
Manager
Heat
Nova
Glance
Cinder
Neutron
Trove DB
Sahara Service
Sahara’s plugin architecture
● This is important!
● It’s where Hadoop distribution vendors
integrate their management software
● It’s how users pick different software
versions
● Currently: Vanilla (reference impl. w/ Apache
versions), HDP (via Ambari), IDH (via Intel
Manager) and under review CDH and Spark
Sahara lets you
● Create and manage clusters
● Define and run analysis jobs
● All through a programmatic interface
● Or a web console
Sahara’s REST API
API v1 (Cluster operations)
● http://bit.ly/1hRXrVX
● Plugins
○ list - comes from configuration
○ get - provides capabilities of a plugin, e.g. services
● Images
○ register - provide basic metadata, username - going
away w/ heat
○ tag/untag - associate image w/ a plugin
API v1 (Cluster operations) (cont)
● http://bit.ly/1hRXrVX
● Templates
○ node groups
○ clusters
● Clusters
○ Instances of templates
API v1.1 (Elastic Data Processing)
● http://bit.ly/1kXGjGj
● Data Source
○ Input and output locations (Swift/HDFS urls)
● Job Binaries
○ Often JARs or scripts stored in Swift or ...
● Jobs
○ Templates for a job with missing parameters
● Job executions
○ Instances of templates with parameters provided
API v2 (future)
Consistent, stable, and clean evolution of v1 & v1.1
○ Image handling in v1 wasn’t RESTful
○ Reduce use of internally stored binaries
○ Jobs & job executions weren’t RESTful
○ Resource naming wasn’t consistent (clusters v job-
executions & cluster-templates v jobs)
○ Prune unused operations, e.g status-refresh
○ Align resource lifecycle, e.g. terminate = stop&delete
vs terminate = stop
Sahara’s Plugin API
Sahara’s Plugin API
● http://bit.ly/1h4MiAW
● get_versions
● get_configs(version)
● get_node_processes(version)
● get_required_image_tags(version)
● validate(cluster)
● configure_cluster(cluster)
● start_cluster(cluster)
● scale_cluster(cluster)
● ...
Roadmap
● I mentioned a couple things, but this is a
community project
● The Icehouse release is tomorrow
● Design summit, where developers & users &
business get together to define the roadmap,
is May 13-16 in Atlanta
Demo with bigpetstore
● http://jayunit100.github.io/bigpetstore/slides
● Bigpetstore (by @jayunit100)
○ A full stack hadoop application
○ Uses the main players in the hadoop ecosystem
○ To demonstrate a single domain
○ Just accepted into the Bigtop project!
Demo with bigpetstore...live (cont)
We’re going to perform petstore transaction
analysis -
1. Generate data from a model
2. Transform data for processing
3. Process w/ pig or mahout, we’ll do pig
4. Visualize results in web app
Demo video...
https://www.youtube.com/watch?v=vmry_kXqn4c
Hadoop on OpenStack - Sahara @DevNation 2014

More Related Content

What's hot

Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStackHong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Sergey Lukjanov
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Sergey Lukjanov
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
DataWorks Summit
 
Savanna project update Jan 2014
Savanna project update Jan 2014Savanna project update Jan 2014
Savanna project update Jan 2014
Sergey Lukjanov
 
Sahara presentation latest - Codemotion Rome 2015
Sahara presentation latest - Codemotion Rome 2015Sahara presentation latest - Codemotion Rome 2015
Sahara presentation latest - Codemotion Rome 2015
Codemotion
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
 

What's hot (20)

20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStackHong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila final
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
 
Savanna project update Jan 2014
Savanna project update Jan 2014Savanna project update Jan 2014
Savanna project update Jan 2014
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
 
Sahara presentation latest - Codemotion Rome 2015
Sahara presentation latest - Codemotion Rome 2015Sahara presentation latest - Codemotion Rome 2015
Sahara presentation latest - Codemotion Rome 2015
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
 

Viewers also liked

Hipi: Computer Vision at Large Scale
Hipi: Computer Vision at Large ScaleHipi: Computer Vision at Large Scale
Hipi: Computer Vision at Large Scale
Liu Liu
 
15 minute presentation about Thesis
15 minute presentation about Thesis15 minute presentation about Thesis
15 minute presentation about Thesis
Sven Meys
 
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
Leila Esmaeili
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
Wei Ting Chen
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
DataWorks Summit
 

Viewers also liked (19)

Hadoop on OpenStack - Trove Day 2014
Hadoop on OpenStack - Trove Day 2014Hadoop on OpenStack - Trove Day 2014
Hadoop on OpenStack - Trove Day 2014
 
Hadoop on OpenStack
Hadoop on OpenStackHadoop on OpenStack
Hadoop on OpenStack
 
2012 09-08-josug-jeff
2012 09-08-josug-jeff2012 09-08-josug-jeff
2012 09-08-josug-jeff
 
Hadoop For OpenStack Log Analysis
Hadoop For OpenStack Log AnalysisHadoop For OpenStack Log Analysis
Hadoop For OpenStack Log Analysis
 
Hipi: Computer Vision at Large Scale
Hipi: Computer Vision at Large ScaleHipi: Computer Vision at Large Scale
Hipi: Computer Vision at Large Scale
 
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
15 minute presentation about Thesis
15 minute presentation about Thesis15 minute presentation about Thesis
15 minute presentation about Thesis
 
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Using MapReduce for Large–scale Medical Image Analysis
Using MapReduce for Large–scale Medical Image AnalysisUsing MapReduce for Large–scale Medical Image Analysis
Using MapReduce for Large–scale Medical Image Analysis
 
Virtualizing Hadoop
Virtualizing HadoopVirtualizing Hadoop
Virtualizing Hadoop
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
 
Sahara Updates - Kilo Edition
Sahara Updates - Kilo EditionSahara Updates - Kilo Edition
Sahara Updates - Kilo Edition
 
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageWebinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
 
آشنایی با جرم‌یابی قانونی رایانه‌ای
آشنایی با جرم‌یابی قانونی رایانه‌ایآشنایی با جرم‌یابی قانونی رایانه‌ای
آشنایی با جرم‌یابی قانونی رایانه‌ای
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
 
Cloud Security and Risk Management
Cloud Security and Risk ManagementCloud Security and Risk Management
Cloud Security and Risk Management
 

Similar to Hadoop on OpenStack - Sahara @DevNation 2014

Similar to Hadoop on OpenStack - Sahara @DevNation 2014 (20)

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batch
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other tools
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
 

Recently uploaded

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 

Hadoop on OpenStack - Sahara @DevNation 2014

  • 1. Big data processing with Hadoop on OpenStack Matthew Farrellee (@spinningmatt) Red Hat
  • 2. Here for a talk about Savanna? Oops, this talk is about Sahara. Good news is they’re the same thing. Savanna was renamed for trademark reasons to Sahara. You have to go to page 10 of google results to find out why: https://www.google.com/search?q=savanna+hadoop&start=90
  • 3. In brief - what is Hadoop ● Narrow - Apache Hadoop - a specific Apache project originally from Yahoo!, based on papers published from Google ● Broad - an ecosystem of projects, mostly Apache, that integrate in some way with Apache Hadoop ● Most common to use the broad definition
  • 4. Hadoop from Hortonworks (+ others) ● Multiple projects ○ Workload management ○ Resource management ○ System management ○ Data ingest & storage ○ Compute frameworks ○ Domain languages ● Data storage and processing focused
  • 5. In brief - what is OpenStack OpenStack is a cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, all managed through a dashboard that gives administrators control while empowering their users to provision resources through a web interface.
  • 6. An ecosystem of projects ● Compute - Nova ● Networking - Neutron ● Object Storage - Swift ● Block Storage - Cinder ● Identity - Keystone ● Image Service - Glance ● Dashboard - Horizon ● Telemetry - Ceilometer ● Orchestration - Heat ● Data Processing - Sahara
  • 7. Longer comments on big data Choose your own adventure… Go to the next slide and get the day over sooner See some shoegazing followed by a rant and have the day last longer
  • 8. Interest (via Google Trends) Hadoop EC2 OpenStack www.google.com/trends/explore#q=hadoop,ec2,openstack
  • 9. Interest (via Google Trends) Hadoop EC2 OpenStack www.google.com/trends/explore#q=hadoop,ec2,openstack EC2 beta Aug 25 2006 (http://aws.typepad. com/aws/2006/08/amazon_ec2_beta.html)
  • 11. Analysis - have a question ● Even this alone is hard to come up with ● The question you answer won’t be the question you set out to ask ● You’ll have to iterate and refine Can I predict doctor specialty from what procedures they perform?
  • 12. Analysis - finding the data ● Publically - ○ Tons of data repositories ○ No consistency, even within a specific repository ● Privately - ○ Data often hidden in silos ○ Even less consistency ● Avoid datasets that don’t come with a dictionary ○ Data w/o a dictionary is like code w/o comments
  • 13. Analysis - acceptable use ● Publically - ○ Data sets often have associated licenses ○ Yes, even public (government) sets ○ You may have to find an alternative set ● Privately - ○ Often tightly controlled, considered sensitive business data ○ If you can use it, maybe only in a specific place ○ Likely no alternatives
  • 14. ● The story of Stephen Glasser and Cheryl Palma ● Two of the oldest people in the medical profession working with medicare ● Stephen Glasser graduated in 1773 ● Cheryl Palma graduated in 1776 Analysis - explore / clean the data
  • 15. Analysis - finally ● You got some answer to a question you approximately asked ● You must refine the question and process ● Repeat This is hard enough without having to manage tools and infrastructure!
  • 16. Sahara’s goal Make managing Hadoop+ infrastructure and tools so simple that doing so never gets in your way
  • 17. Sahara is ● An OpenStack project in the Data Processing program ● Started one year ago (Summit in Portland) ● Incubated in Icehouse (6 months ago) ● Integrated for Juno (6 months from now)
  • 19. Sahara’s plugin architecture ● This is important! ● It’s where Hadoop distribution vendors integrate their management software ● It’s how users pick different software versions ● Currently: Vanilla (reference impl. w/ Apache versions), HDP (via Ambari), IDH (via Intel Manager) and under review CDH and Spark
  • 20. Sahara lets you ● Create and manage clusters ● Define and run analysis jobs ● All through a programmatic interface ● Or a web console
  • 22. API v1 (Cluster operations) ● http://bit.ly/1hRXrVX ● Plugins ○ list - comes from configuration ○ get - provides capabilities of a plugin, e.g. services ● Images ○ register - provide basic metadata, username - going away w/ heat ○ tag/untag - associate image w/ a plugin
  • 23. API v1 (Cluster operations) (cont) ● http://bit.ly/1hRXrVX ● Templates ○ node groups ○ clusters ● Clusters ○ Instances of templates
  • 24. API v1.1 (Elastic Data Processing) ● http://bit.ly/1kXGjGj ● Data Source ○ Input and output locations (Swift/HDFS urls) ● Job Binaries ○ Often JARs or scripts stored in Swift or ... ● Jobs ○ Templates for a job with missing parameters ● Job executions ○ Instances of templates with parameters provided
  • 25. API v2 (future) Consistent, stable, and clean evolution of v1 & v1.1 ○ Image handling in v1 wasn’t RESTful ○ Reduce use of internally stored binaries ○ Jobs & job executions weren’t RESTful ○ Resource naming wasn’t consistent (clusters v job- executions & cluster-templates v jobs) ○ Prune unused operations, e.g status-refresh ○ Align resource lifecycle, e.g. terminate = stop&delete vs terminate = stop
  • 27. Sahara’s Plugin API ● http://bit.ly/1h4MiAW ● get_versions ● get_configs(version) ● get_node_processes(version) ● get_required_image_tags(version) ● validate(cluster) ● configure_cluster(cluster) ● start_cluster(cluster) ● scale_cluster(cluster) ● ...
  • 28. Roadmap ● I mentioned a couple things, but this is a community project ● The Icehouse release is tomorrow ● Design summit, where developers & users & business get together to define the roadmap, is May 13-16 in Atlanta
  • 29. Demo with bigpetstore ● http://jayunit100.github.io/bigpetstore/slides ● Bigpetstore (by @jayunit100) ○ A full stack hadoop application ○ Uses the main players in the hadoop ecosystem ○ To demonstrate a single domain ○ Just accepted into the Bigtop project!
  • 30. Demo with bigpetstore...live (cont) We’re going to perform petstore transaction analysis - 1. Generate data from a model 2. Transform data for processing 3. Process w/ pig or mahout, we’ll do pig 4. Visualize results in web app