SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
Apache Whirr
          Cloud Services Made Easy




          Lars George, Cloudera
          HUGUK Meetup
          12 May 2011



Friday, May 20, 2011
What is Apache Whirr?
         ▪   An Apache Incubator project for running services in the cloud
         ▪   A community for sharing code and configuration for running
             clusters
             ▪   People: 9 committers (6 orgs), more contributors and users
             ▪   Projects: Hadoop, HBase, ZooKeeper, Cassandra
                 ▪   More coming: Voldemort, ElasticSearch, Mesos, Oozie
         ▪   Whirr is used for
             ▪   Testing
             ▪   Evaluation and proof of concept
             ▪   Production (future)


Friday, May 20, 2011
High Level Overview
         ▪   Whirr uses low-level API libraries to talk to cloud providers
         ▪   Orchestrates clusters based on roles
         ▪   Configuration files define entire cluster
             ▪   Override anything you need to change on the command line
         ▪   Spins up minimal images, only base OS and SSH
             ▪   Ships your credentials
         ▪   Services are a combination of roles
             ▪   Define bootstrapping and configuration actions
         ▪   Adding your own service is (nearly) trivial
             ▪   No dealings with cloud provider internals

Friday, May 20, 2011
Steps in writing a Whirr service
         ▪   1. Identify service roles
         ▪   2. Write a ClusterActionHandler for each role
         ▪   3. Write scripts that run on cloud nodes
         ▪   4. Package and install
         ▪   5. Run




Friday, May 20, 2011
1. Identify service roles
         ▪   Flume, a service for collecting         https://github.com/cloudera/flume
             and moving large amounts of
             data
         ▪   Flume Master
             ▪   The head node, for coordination
             ▪   Whirr role name: flumedemo-master
         ▪   Flume Node
             ▪   Runs agents (generate logs) or
                 collectors (aggregate logs)
             ▪   Whirr role name: flumedemo-node



Friday, May 20, 2011
2. Write a ClusterActionHandler for each role
          public class FlumeNodeHandler extends ClusterActionHandlerSupport {

              public static final String ROLE = "flumedemo-node";

              @Override public String getRole() { return ROLE; }

              @Override
              protected void beforeBootstrap(ClusterActionEvent event) throws IOException,
                  InterruptedException {
                addStatement(event, call("install_java"));
                addStatement(event, call("install_flumedemo"));
              }

              // more ...
          }




Friday, May 20, 2011
Handlers can interact...
          public class FlumeNodeHandler extends ClusterActionHandlerSupport {

              // continued ...

              @Override
              protected void beforeConfigure(ClusterActionEvent event) throws IOException,
                  InterruptedException {
                // firewall ingress authorization omitted

                  Instance master = cluster.getInstanceMatching(role(FlumeMasterHandler.ROLE));
                  String masterAddress = master.getPrivateAddress().getHostAddress();
                  addStatement(event, call("configure_flumedemo_node", masterAddress));
              }
          }




Friday, May 20, 2011
3. Write scripts that run on cloud nodes
         ▪   install_java is built in
         ▪   Other functions are specified in individual files



               function install_flumedemo() {
                 curl -O http://cloud.github.com/downloads/cloudera/flume/flume-0.9.3.tar.gz
                 tar -C /usr/local/ -zxf flume-0.9.3.tar.gz
                 echo "export FLUME_CONF_DIR=/usr/local/flume-0.9.3/conf" >> /etc/profile
               }




Friday, May 20, 2011
You can run as many scripts as you want
         ▪   This script takes an argument to specify the master
             function configure_flumedemo_node() {
               MASTER_HOST=$1
               cat > /usr/local/flume-0.9.3/conf/flume-site.xml <<EOF
             <?xml version="1.0"?>
             <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
             <configuration>
               <property>
                 <name>flume.master.servers</name>
                 <value>$MASTER_HOST</value>
               </property>
             </configuration>
             EOF
               FLUME_CONF_DIR=/usr/local/flume-0.9.3/conf 
                 nohup /usr/local/flume-0.9.3/bin/flume master > /var/log/flume.log 2>&1 &
             }




Friday, May 20, 2011
4. Package and install
         ▪   Each service is a self-contained JAR:

                       functions/configure_flumedemo_master.sh
                       functions/configure_flumedemo_node.sh
                       functions/install_flumedemo.sh
                       META-INF/services/org.apache.whirr.service.ClusterActionHandler
                       org/apache/whirr/service/example/FlumeMasterHandler.class
                       org/apache/whirr/service/example/FlumeNodeHandler.class

         ▪   Discovered using java.util.ServiceLoader facility
             ▪   META-INF/services/org.apache.whirr.service.ClusterActionHandler:
                       org.apache.whirr.service.example.FlumeMasterHandler
                       org.apache.whirr.service.example.FlumeNodeHandler

         ▪   Place JAR in Whirr’s lib directory


Friday, May 20, 2011
5. Run
         ▪   Create a cluster spec file
                whirr.cluster-name=flumedemo
                whirr.instance-templates=1 flumedemo-master,1 flumedemo-node
                whirr.provider=aws-ec2
                whirr.identity=${env:AWS_ACCESS_KEY_ID}
                whirr.credential=${env:AWS_SECRET_ACCESS_KEY}

         ▪   Then launch from the CLI
                % whirr launch-cluster --config flumedemo.properties

         ▪   or Java
               Configuration conf = new PropertiesConfiguration("flumedemo.properties");
               ClusterSpec spec = new ClusterSpec(conf);
               Service s = new Service();
               Cluster cluster = s.launchCluster(spec);
               // interact with cluster
               s.destroyCluster(spec);


Friday, May 20, 2011
Demo




Friday, May 20, 2011
Orchestration
         ▪   Instance templates are acted on independently in parallel
         ▪   Bootstrap phase
             ▪   start 1 instance for the flumedemo-master role and run its bootstrap
                 script
             ▪   start 1 instance for the flumedemo-node role and run its bootstrap
                 script
         ▪   Configure phase
             ▪   run the configure script on the   flumedemo-master   instance
             ▪   run the configure script on the   flumedemo-node   instance
         ▪   Note there is a barrier between the two phases, so nodes can get
             the master address in the configure phase

Friday, May 20, 2011
Going further: Hadoop
         ▪   Service configuration
             ▪   Flume example was very simple
             ▪   In practice, you want to be able to override any service property
             ▪   For Hadoop, Whirr generates the service config file and copies
                 to cluster
             ▪   E.g.   hadoop-common.fs.trash.interval=1440

                 ▪   Sets fs.trash.interval in the cluster configuration
         ▪   Service version
             ▪   Tarball is parameterized by         whirr.hadoop.tarball.url




Friday, May 20, 2011
Going further: HBase
                                            Web                                        REST
                                                                       HTable
                                           Browser                                     Client




           Public
                       UI            RPC      UI            RPC   UI             RPC            HTTP


                            Master            RegionServer        RegionServer              RestServer



                            listen                 listen               listen                  listen
           Private




Friday, May 20, 2011
Going further: HBase
         ▪   Service composition
          whirr.instance-templates=1 zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master,
          5 hadoop-datanode+hadoop-tasktracker+hbase-regionserver


         ▪   HBase handlers will do the following in beforeConfigure():
             ▪   open ports
             ▪   pass in ZooKeeper quorum from zookeeper role
             ▪   for non-master nodes: pass in master address
         ▪   Notice that the Hadoop cluster is overlaid on the HBase cluster




Friday, May 20, 2011
Challenges
         ▪   Complexity
         ▪   Degrees of freedom
             ▪   #clouds × #OS × #hardware × #locations × #services ×
                 #configs = a big number!
             ▪   Known good configurations, recipes
             ▪   Regular automated testing
         ▪   Debugging
             ▪   What to do when the service hasn’t come up?
             ▪   Logs
             ▪   Recoverability


Friday, May 20, 2011
What’s next?
         ▪   Release 0.5.0 coming soon
             ▪   Support local tarball upload (WHIRR-220)
             ▪   Allow to run component tests in memory (WHIRR-243)
         ▪   More services (ElasticSearch, Voldemort)
         ▪   More (tested) cloud providers
         ▪   Pool provider/BYON
         ▪   Cluster resizing
         ▪   https://cwiki.apache.org/confluence/display/WHIRR/RoadMap




Friday, May 20, 2011
Questions
         ▪   Find out more at
             ▪   http://incubator.apache.org/whirr
         ▪   lars@cloudera.com




Friday, May 20, 2011
provision
                       Public
                                   hbase-master   hbase-regionserver   hbase-regionserver   hbase-restserver

                                 Ubuntu 10.04     Ubuntu 10.04         Ubuntu 10.04         Ubuntu 10.04
                                 8GB RAM          8GB RAM              8GB RAM              2GB RAM
                                 8 Cores          8 Cores              8 Cores              1 CPU



                       Private




Friday, May 20, 2011
install
                       Public
                                  hbase-master    hbase-regionserver   hbase-regionserver   hbase-restserver

                                 login: hbase      login: hbase         login: hbase        login: hbase
                                 java: 1.6.0_16    java: 1.6.0_16       java: 1.6.0_16      java: 1.6.0_16
                                 hbase: 0.90.1     hbase: 0.90.1        hbase: 0.90.1       hbase: 0.90.1



                       Private




Friday, May 20, 2011
configure
                       Public
                                 UI            RPC   UI            RPC   UI            RPC     HTTP


                                      Master         RegionServer        RegionServer        RestServer



                                      listen              listen              listen           listen
                       Private




Friday, May 20, 2011
manage
                   Public
                            UI            RPC     UI            RPC   UI            RPC


                             Master #2                 Master         RegionServer



                                 listen                listen              listen
                  Private




Friday, May 20, 2011

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

London HUG 12/4
London HUG 12/4London HUG 12/4
London HUG 12/4
 
Using Ansible for Deploying to Cloud Environments
Using Ansible for Deploying to Cloud EnvironmentsUsing Ansible for Deploying to Cloud Environments
Using Ansible for Deploying to Cloud Environments
 
Terraform Modules and Continuous Deployment
Terraform Modules and Continuous DeploymentTerraform Modules and Continuous Deployment
Terraform Modules and Continuous Deployment
 
Atlanta OpenStack 2014 Chef for OpenStack Deployment Workshop
Atlanta OpenStack 2014 Chef for OpenStack Deployment WorkshopAtlanta OpenStack 2014 Chef for OpenStack Deployment Workshop
Atlanta OpenStack 2014 Chef for OpenStack Deployment Workshop
 
Terraform Immutablish Infrastructure with Consul-Template
Terraform Immutablish Infrastructure with Consul-TemplateTerraform Immutablish Infrastructure with Consul-Template
Terraform Immutablish Infrastructure with Consul-Template
 
Ansible Crash Course
Ansible Crash CourseAnsible Crash Course
Ansible Crash Course
 
Test-Driven Infrastructure with Ansible, Test Kitchen, Serverspec and RSpec
Test-Driven Infrastructure with Ansible, Test Kitchen, Serverspec and RSpecTest-Driven Infrastructure with Ansible, Test Kitchen, Serverspec and RSpec
Test-Driven Infrastructure with Ansible, Test Kitchen, Serverspec and RSpec
 
Developing Terraform Modules at Scale - HashiTalks 2021
Developing Terraform Modules at Scale - HashiTalks 2021Developing Terraform Modules at Scale - HashiTalks 2021
Developing Terraform Modules at Scale - HashiTalks 2021
 
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
 
Ansible at work
Ansible at workAnsible at work
Ansible at work
 
Apache Cassandra and Go
Apache Cassandra and GoApache Cassandra and Go
Apache Cassandra and Go
 
Fake IT, until you make IT
Fake IT, until you make ITFake IT, until you make IT
Fake IT, until you make IT
 
The SaltStack Pub Crawl - Fosscomm 2016
The SaltStack Pub Crawl - Fosscomm 2016The SaltStack Pub Crawl - Fosscomm 2016
The SaltStack Pub Crawl - Fosscomm 2016
 
Terraform day02
Terraform day02Terraform day02
Terraform day02
 
A Hands-on Introduction on Terraform Best Concepts and Best Practices
A Hands-on Introduction on Terraform Best Concepts and Best Practices A Hands-on Introduction on Terraform Best Concepts and Best Practices
A Hands-on Introduction on Terraform Best Concepts and Best Practices
 
Go Faster with Ansible (PHP meetup)
Go Faster with Ansible (PHP meetup)Go Faster with Ansible (PHP meetup)
Go Faster with Ansible (PHP meetup)
 
Automated infrastructure is on the menu
Automated infrastructure is on the menuAutomated infrastructure is on the menu
Automated infrastructure is on the menu
 
An intro to Docker, Terraform, and Amazon ECS
An intro to Docker, Terraform, and Amazon ECSAn intro to Docker, Terraform, and Amazon ECS
An intro to Docker, Terraform, and Amazon ECS
 
SaltConf14 - Anita Kuno, HP & OpenStack - Using SaltStack for event-driven or...
SaltConf14 - Anita Kuno, HP & OpenStack - Using SaltStack for event-driven or...SaltConf14 - Anita Kuno, HP & OpenStack - Using SaltStack for event-driven or...
SaltConf14 - Anita Kuno, HP & OpenStack - Using SaltStack for event-driven or...
 
OpenStack Austin Meetup January 2014: Chef + OpenStack
OpenStack Austin Meetup January 2014: Chef + OpenStackOpenStack Austin Meetup January 2014: Chef + OpenStack
OpenStack Austin Meetup January 2014: Chef + OpenStack
 

Destaque

Jawbone jambox Manual
Jawbone jambox ManualJawbone jambox Manual
Jawbone jambox Manual
jamboxcase
 
C:\My Movies\2 Search Enginge Optimization Seth Besmertnik
C:\My Movies\2 Search Enginge Optimization    Seth BesmertnikC:\My Movies\2 Search Enginge Optimization    Seth Besmertnik
C:\My Movies\2 Search Enginge Optimization Seth Besmertnik
guest889866
 
The Mourning Of Christ By Giotto Di Bondone
The Mourning Of Christ By Giotto Di BondoneThe Mourning Of Christ By Giotto Di Bondone
The Mourning Of Christ By Giotto Di Bondone
guest358697
 
Setting up the To Do Module
Setting up the To Do ModuleSetting up the To Do Module
Setting up the To Do Module
Michael Payne
 
Magic Quadrant for Corporate Telephony
Magic Quadrant for Corporate TelephonyMagic Quadrant for Corporate Telephony
Magic Quadrant for Corporate Telephony
Kathryn Covolus
 

Destaque (14)

Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
Shruthi_GV_Resume
Shruthi_GV_ResumeShruthi_GV_Resume
Shruthi_GV_Resume
 
Alto riesgo cardiovascular
Alto riesgo cardiovascularAlto riesgo cardiovascular
Alto riesgo cardiovascular
 
Salesforce Known Issues: The Lifecycle of a Bug
Salesforce Known Issues: The Lifecycle of a BugSalesforce Known Issues: The Lifecycle of a Bug
Salesforce Known Issues: The Lifecycle of a Bug
 
Jawbone jambox Manual
Jawbone jambox ManualJawbone jambox Manual
Jawbone jambox Manual
 
C:\My Movies\2 Search Enginge Optimization Seth Besmertnik
C:\My Movies\2 Search Enginge Optimization    Seth BesmertnikC:\My Movies\2 Search Enginge Optimization    Seth Besmertnik
C:\My Movies\2 Search Enginge Optimization Seth Besmertnik
 
World Travel Market London: Creating Content Travellers Really Want
World Travel Market London: Creating Content Travellers Really WantWorld Travel Market London: Creating Content Travellers Really Want
World Travel Market London: Creating Content Travellers Really Want
 
SEO 101: The Basics and Beyond
SEO 101: The Basics and BeyondSEO 101: The Basics and Beyond
SEO 101: The Basics and Beyond
 
Polycom soundstation premier satellite data sheet
Polycom soundstation premier satellite data sheetPolycom soundstation premier satellite data sheet
Polycom soundstation premier satellite data sheet
 
The Mourning Of Christ By Giotto Di Bondone
The Mourning Of Christ By Giotto Di BondoneThe Mourning Of Christ By Giotto Di Bondone
The Mourning Of Christ By Giotto Di Bondone
 
Setting up the To Do Module
Setting up the To Do ModuleSetting up the To Do Module
Setting up the To Do Module
 
Integrated Development Programme
Integrated Development ProgrammeIntegrated Development Programme
Integrated Development Programme
 
Magic Quadrant for Corporate Telephony
Magic Quadrant for Corporate TelephonyMagic Quadrant for Corporate Telephony
Magic Quadrant for Corporate Telephony
 
Toad Business Intelligence Suite
Toad Business Intelligence Suite Toad Business Intelligence Suite
Toad Business Intelligence Suite
 

Semelhante a Apache Whirr

Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAutomating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps Approach
Akshaya Mahapatra
 

Semelhante a Apache Whirr (20)

Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAutomating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps Approach
 
Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011
 
2012 09-08-josug-jeff
2012 09-08-josug-jeff2012 09-08-josug-jeff
2012 09-08-josug-jeff
 
4. open mano set up and usage
4. open mano set up and usage4. open mano set up and usage
4. open mano set up and usage
 
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy SystemA Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy System
 
DevOps in PHP environment
DevOps in PHP environment DevOps in PHP environment
DevOps in PHP environment
 
Cloud meets Fog & Puppet A Story of Version Controlled Infrastructure
Cloud meets Fog & Puppet A Story of Version Controlled InfrastructureCloud meets Fog & Puppet A Story of Version Controlled Infrastructure
Cloud meets Fog & Puppet A Story of Version Controlled Infrastructure
 
Automation day red hat ansible
   Automation day red hat ansible    Automation day red hat ansible
Automation day red hat ansible
 
OpenNebula, the foreman and CentOS play nice, too
OpenNebula, the foreman and CentOS play nice, tooOpenNebula, the foreman and CentOS play nice, too
OpenNebula, the foreman and CentOS play nice, too
 
Building web framework with Rack
Building web framework with RackBuilding web framework with Rack
Building web framework with Rack
 
Deploying OpenStack with Chef
Deploying OpenStack with ChefDeploying OpenStack with Chef
Deploying OpenStack with Chef
 
Bare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefBare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and Chef
 
Oracle API Gateway Installation
Oracle API Gateway InstallationOracle API Gateway Installation
Oracle API Gateway Installation
 
Automação do físico ao NetSecDevOps
Automação do físico ao NetSecDevOpsAutomação do físico ao NetSecDevOps
Automação do físico ao NetSecDevOps
 
Harmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetHarmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and Puppet
 
Dockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and NovaDockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and Nova
 
A Tale of a Server Architecture (Frozen Rails 2012)
A Tale of a Server Architecture (Frozen Rails 2012)A Tale of a Server Architecture (Frozen Rails 2012)
A Tale of a Server Architecture (Frozen Rails 2012)
 
Ansible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less CoffeeAnsible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less Coffee
 
Lua tech talk
Lua tech talkLua tech talk
Lua tech talk
 

Mais de huguk

Mais de huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Apache Whirr

  • 1. Apache Whirr Cloud Services Made Easy Lars George, Cloudera HUGUK Meetup 12 May 2011 Friday, May 20, 2011
  • 2. What is Apache Whirr? ▪ An Apache Incubator project for running services in the cloud ▪ A community for sharing code and configuration for running clusters ▪ People: 9 committers (6 orgs), more contributors and users ▪ Projects: Hadoop, HBase, ZooKeeper, Cassandra ▪ More coming: Voldemort, ElasticSearch, Mesos, Oozie ▪ Whirr is used for ▪ Testing ▪ Evaluation and proof of concept ▪ Production (future) Friday, May 20, 2011
  • 3. High Level Overview ▪ Whirr uses low-level API libraries to talk to cloud providers ▪ Orchestrates clusters based on roles ▪ Configuration files define entire cluster ▪ Override anything you need to change on the command line ▪ Spins up minimal images, only base OS and SSH ▪ Ships your credentials ▪ Services are a combination of roles ▪ Define bootstrapping and configuration actions ▪ Adding your own service is (nearly) trivial ▪ No dealings with cloud provider internals Friday, May 20, 2011
  • 4. Steps in writing a Whirr service ▪ 1. Identify service roles ▪ 2. Write a ClusterActionHandler for each role ▪ 3. Write scripts that run on cloud nodes ▪ 4. Package and install ▪ 5. Run Friday, May 20, 2011
  • 5. 1. Identify service roles ▪ Flume, a service for collecting https://github.com/cloudera/flume and moving large amounts of data ▪ Flume Master ▪ The head node, for coordination ▪ Whirr role name: flumedemo-master ▪ Flume Node ▪ Runs agents (generate logs) or collectors (aggregate logs) ▪ Whirr role name: flumedemo-node Friday, May 20, 2011
  • 6. 2. Write a ClusterActionHandler for each role public class FlumeNodeHandler extends ClusterActionHandlerSupport { public static final String ROLE = "flumedemo-node"; @Override public String getRole() { return ROLE; } @Override protected void beforeBootstrap(ClusterActionEvent event) throws IOException, InterruptedException { addStatement(event, call("install_java")); addStatement(event, call("install_flumedemo")); } // more ... } Friday, May 20, 2011
  • 7. Handlers can interact... public class FlumeNodeHandler extends ClusterActionHandlerSupport { // continued ... @Override protected void beforeConfigure(ClusterActionEvent event) throws IOException, InterruptedException { // firewall ingress authorization omitted Instance master = cluster.getInstanceMatching(role(FlumeMasterHandler.ROLE)); String masterAddress = master.getPrivateAddress().getHostAddress(); addStatement(event, call("configure_flumedemo_node", masterAddress)); } } Friday, May 20, 2011
  • 8. 3. Write scripts that run on cloud nodes ▪ install_java is built in ▪ Other functions are specified in individual files function install_flumedemo() { curl -O http://cloud.github.com/downloads/cloudera/flume/flume-0.9.3.tar.gz tar -C /usr/local/ -zxf flume-0.9.3.tar.gz echo "export FLUME_CONF_DIR=/usr/local/flume-0.9.3/conf" >> /etc/profile } Friday, May 20, 2011
  • 9. You can run as many scripts as you want ▪ This script takes an argument to specify the master function configure_flumedemo_node() { MASTER_HOST=$1 cat > /usr/local/flume-0.9.3/conf/flume-site.xml <<EOF <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>flume.master.servers</name> <value>$MASTER_HOST</value> </property> </configuration> EOF FLUME_CONF_DIR=/usr/local/flume-0.9.3/conf nohup /usr/local/flume-0.9.3/bin/flume master > /var/log/flume.log 2>&1 & } Friday, May 20, 2011
  • 10. 4. Package and install ▪ Each service is a self-contained JAR: functions/configure_flumedemo_master.sh functions/configure_flumedemo_node.sh functions/install_flumedemo.sh META-INF/services/org.apache.whirr.service.ClusterActionHandler org/apache/whirr/service/example/FlumeMasterHandler.class org/apache/whirr/service/example/FlumeNodeHandler.class ▪ Discovered using java.util.ServiceLoader facility ▪ META-INF/services/org.apache.whirr.service.ClusterActionHandler: org.apache.whirr.service.example.FlumeMasterHandler org.apache.whirr.service.example.FlumeNodeHandler ▪ Place JAR in Whirr’s lib directory Friday, May 20, 2011
  • 11. 5. Run ▪ Create a cluster spec file whirr.cluster-name=flumedemo whirr.instance-templates=1 flumedemo-master,1 flumedemo-node whirr.provider=aws-ec2 whirr.identity=${env:AWS_ACCESS_KEY_ID} whirr.credential=${env:AWS_SECRET_ACCESS_KEY} ▪ Then launch from the CLI % whirr launch-cluster --config flumedemo.properties ▪ or Java Configuration conf = new PropertiesConfiguration("flumedemo.properties"); ClusterSpec spec = new ClusterSpec(conf); Service s = new Service(); Cluster cluster = s.launchCluster(spec); // interact with cluster s.destroyCluster(spec); Friday, May 20, 2011
  • 13. Orchestration ▪ Instance templates are acted on independently in parallel ▪ Bootstrap phase ▪ start 1 instance for the flumedemo-master role and run its bootstrap script ▪ start 1 instance for the flumedemo-node role and run its bootstrap script ▪ Configure phase ▪ run the configure script on the flumedemo-master instance ▪ run the configure script on the flumedemo-node instance ▪ Note there is a barrier between the two phases, so nodes can get the master address in the configure phase Friday, May 20, 2011
  • 14. Going further: Hadoop ▪ Service configuration ▪ Flume example was very simple ▪ In practice, you want to be able to override any service property ▪ For Hadoop, Whirr generates the service config file and copies to cluster ▪ E.g. hadoop-common.fs.trash.interval=1440 ▪ Sets fs.trash.interval in the cluster configuration ▪ Service version ▪ Tarball is parameterized by whirr.hadoop.tarball.url Friday, May 20, 2011
  • 15. Going further: HBase Web REST HTable Browser Client Public UI RPC UI RPC UI RPC HTTP Master RegionServer RegionServer RestServer listen listen listen listen Private Friday, May 20, 2011
  • 16. Going further: HBase ▪ Service composition whirr.instance-templates=1 zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master, 5 hadoop-datanode+hadoop-tasktracker+hbase-regionserver ▪ HBase handlers will do the following in beforeConfigure(): ▪ open ports ▪ pass in ZooKeeper quorum from zookeeper role ▪ for non-master nodes: pass in master address ▪ Notice that the Hadoop cluster is overlaid on the HBase cluster Friday, May 20, 2011
  • 17. Challenges ▪ Complexity ▪ Degrees of freedom ▪ #clouds × #OS × #hardware × #locations × #services × #configs = a big number! ▪ Known good configurations, recipes ▪ Regular automated testing ▪ Debugging ▪ What to do when the service hasn’t come up? ▪ Logs ▪ Recoverability Friday, May 20, 2011
  • 18. What’s next? ▪ Release 0.5.0 coming soon ▪ Support local tarball upload (WHIRR-220) ▪ Allow to run component tests in memory (WHIRR-243) ▪ More services (ElasticSearch, Voldemort) ▪ More (tested) cloud providers ▪ Pool provider/BYON ▪ Cluster resizing ▪ https://cwiki.apache.org/confluence/display/WHIRR/RoadMap Friday, May 20, 2011
  • 19. Questions ▪ Find out more at ▪ http://incubator.apache.org/whirr ▪ lars@cloudera.com Friday, May 20, 2011
  • 20. provision Public hbase-master hbase-regionserver hbase-regionserver hbase-restserver Ubuntu 10.04 Ubuntu 10.04 Ubuntu 10.04 Ubuntu 10.04 8GB RAM 8GB RAM 8GB RAM 2GB RAM 8 Cores 8 Cores 8 Cores 1 CPU Private Friday, May 20, 2011
  • 21. install Public hbase-master hbase-regionserver hbase-regionserver hbase-restserver login: hbase login: hbase login: hbase login: hbase java: 1.6.0_16 java: 1.6.0_16 java: 1.6.0_16 java: 1.6.0_16 hbase: 0.90.1 hbase: 0.90.1 hbase: 0.90.1 hbase: 0.90.1 Private Friday, May 20, 2011
  • 22. configure Public UI RPC UI RPC UI RPC HTTP Master RegionServer RegionServer RestServer listen listen listen listen Private Friday, May 20, 2011
  • 23. manage Public UI RPC UI RPC UI RPC Master #2 Master RegionServer listen listen listen Private Friday, May 20, 2011