SlideShare a Scribd company logo
1 of 29
Monitoring @ Scale
  How Rackspace does it.




         Gary Dusbabek
Outline
1. Our Problems
2. Other Requirements
    (Hopes and Dreams)
3. How we did it
Our Problems
• Many tens of thousands of servers
• Several solutions in place
  – Merged segments
  – Wouldn’t scale to all
  – Tedious (Manpower)
  – Expensive
• Wanted more
Hopes and Dreams
ExternalUsers



       No Special Cases



                 Internal Users
Know   about    problems

   Before customers do.
Gain
Insight
          Trending
          Capacity Planning
          More
MONITORING


  Throwing away data
Throwing away knowledge

    Y U NO BIG DATA?
Availability
         Component Failure
Availability
         Datacenter Failure
Supportable

Debuggable

Deployable
How We Built It
100% Ownership

5 Devs
2 Ops
Technology
 Choices
API endpoints
Other services that write to Cassandra

           node-whiskey˚
       node-cassandra-client˚
            node-swiz˚
         node-elementtree˚
C projects
Reconnoiter
  – noitdˆ
  – stratcondˆ
 Scribe
Java projects
Mainly for concurrency
Apache Cassandraˆ
 Apache Zookeeper
 Apache Thriftˆ
 Metric Ingestion
 Complex events (Esper)
A Tale of Two Clusters
     Used Differently

      Control Cluster
       Data Cluster
Data Model
    Relational Mismatch
       Our Approach
   One row per account
 Prefixed Ids for col names
Concatenate for parent/child
Data Model
    Foo objects get ids like ‘foXXX’

    Bar objects get ids like ‘baXXX’

 Assume 1:* relationship from foo:bar

Bar 456 that is attached to Foo 123 gets
      column name ‘fo123:ba456’

Select ‘fo123:ba’..’fo123:baxefxbfxfb’
Lua/Luvitˆ
Virgo˚ (host agent)
   https://github.com/racker/virgo




      Vagrant/Chef
                     Deployment
                    Development
Open Source Contribution


        Dreadnot˚
  Inspired by Deployinator
 Multi-region deployment tool



 https://github.com/racker/dreadnot
    Blog Post - http://bit.ly/xks0qT
Oh noes!
Supportability
   Dashboard
   KPI Metrics
    Logging
 Dump/Load (ops tools)
Automation
• Testing continuously on
  buildbot


Deployment
• Dreadnot (bb + chef)
• Constant deployment
  and self-monitoring
  (nagios + graphite)
• Single-region upgrades
Documentation
Written by programmers

 Examples generated
     from tests
Thanks!
            Private Beta
https://cmbeta.api.rackspacecloud.com/

            @gdusbabek

       https://github.com/racker
Image Credits
Scale           http://www.flickr.com/photos/puuikibeach/4765115333
Pyramids        http://www.flickr.com/photos/gracewong/93631410
Hercules        http://www.flickr.com/photos/istolethetv/2203377554/
Dandelion       http://www.flickr.com/photos/8047705@N02/5572197407/
Ear             http://www.flickr.com/photos/perpetualplum/3974880498
Hourglass       http://www.flickr.com/photos/22244945@N00/3278869535
Eyeball         http://www.flickr.com/photos/miran/6567911705
few gears       http://www.flickr.com/photos/arthurjohnpicton/5364226117
Many Gears      http://www.flickr.com/photos/mwichary/2294174641
Tools           http://www.flickr.com/photos/zzpza/3269784239/
Crane           http://www.flickr.com/photos/katatoniq/2075966238/
Ants            http://www.flickr.com/photos/dendroica/6170146527
Choice          http://www.flickr.com/photos/-bast-/349497988
Old Car         http://www.flickr.com/photos/dok1/353601845/
Monster Truck   http://www.flickr.com/photos/beadmobile/3279378483/
Clusters        http://www.flickr.com/photos/nanagyei/6318995952/
Model           http://www.flickr.com/photos/inl/5097547405/
Agent           http://www.flickr.com/photos/erix/191965832/
Chef            http://www.flickr.com/photos/londonmatt/417683733/
Arrows          http://www.flickr.com/photos/generated/2084287794/
Dashboard       http://www.flickr.com/photos/80502454@N00/4172458435/
Legos           http://www.flickr.com/photos/great8/6820722517/
Xray            http://www.flickr.com/photos/karen_roe/4417259305/
Document        http://www.flickr.com/photos/tusnelda/6140792529/
Fowers          http://www.flickr.com/photos/petercastleton/5905455717/

More Related Content

What's hot

Extended edition: How to speed up .NET and SQL Server web apps (2 x 45 mins w...
Extended edition: How to speed up .NET and SQL Server web apps (2 x 45 mins w...Extended edition: How to speed up .NET and SQL Server web apps (2 x 45 mins w...
Extended edition: How to speed up .NET and SQL Server web apps (2 x 45 mins w...Bart Read
 
2012-04-21-ignite-offline-mobile-app-has-great-potential
2012-04-21-ignite-offline-mobile-app-has-great-potential2012-04-21-ignite-offline-mobile-app-has-great-potential
2012-04-21-ignite-offline-mobile-app-has-great-potentialKenichi Murahashi
 
Why you should add React to your Rails application now!
Why you should add React to your Rails application now!Why you should add React to your Rails application now!
Why you should add React to your Rails application now!David Roberts
 
Using Spark Part Time
Using Spark Part TimeUsing Spark Part Time
Using Spark Part TimeRajiv Shah
 
End-to-End Testing with Cypress
End-to-End Testing with CypressEnd-to-End Testing with Cypress
End-to-End Testing with Cypresskitconcept GmbH
 

What's hot (7)

Boot It Up
Boot It UpBoot It Up
Boot It Up
 
Extended edition: How to speed up .NET and SQL Server web apps (2 x 45 mins w...
Extended edition: How to speed up .NET and SQL Server web apps (2 x 45 mins w...Extended edition: How to speed up .NET and SQL Server web apps (2 x 45 mins w...
Extended edition: How to speed up .NET and SQL Server web apps (2 x 45 mins w...
 
2012-04-21-ignite-offline-mobile-app-has-great-potential
2012-04-21-ignite-offline-mobile-app-has-great-potential2012-04-21-ignite-offline-mobile-app-has-great-potential
2012-04-21-ignite-offline-mobile-app-has-great-potential
 
5 W's of Hookin'
5 W's of Hookin'5 W's of Hookin'
5 W's of Hookin'
 
Why you should add React to your Rails application now!
Why you should add React to your Rails application now!Why you should add React to your Rails application now!
Why you should add React to your Rails application now!
 
Using Spark Part Time
Using Spark Part TimeUsing Spark Part Time
Using Spark Part Time
 
End-to-End Testing with Cypress
End-to-End Testing with CypressEnd-to-End Testing with Cypress
End-to-End Testing with Cypress
 

Similar to Building Rackspace Cloud Monitoring

Mobile Web Speed Bumps
Mobile Web Speed BumpsMobile Web Speed Bumps
Mobile Web Speed BumpsNicholas Zakas
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014Amazon Web Services
 
How Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses CassandraHow Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses Cassandragdusbabek
 
JRuby, Ruby, Rails and You on the Cloud
JRuby, Ruby, Rails and You on the CloudJRuby, Ruby, Rails and You on the Cloud
JRuby, Ruby, Rails and You on the CloudHiro Asari
 
Using ArcGIS Server with Ruby on Rails
Using ArcGIS Server with Ruby on RailsUsing ArcGIS Server with Ruby on Rails
Using ArcGIS Server with Ruby on RailsDave Bouwman
 
Http/2 - What's it all about?
Http/2  - What's it all about?Http/2  - What's it all about?
Http/2 - What's it all about?Andy Davies
 
D Baker - Galaxy Update
D Baker - Galaxy UpdateD Baker - Galaxy Update
D Baker - Galaxy UpdateJan Aerts
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkTaras Matyashovsky
 
Fix me if you can - DrupalCon prague
Fix me if you can - DrupalCon pragueFix me if you can - DrupalCon prague
Fix me if you can - DrupalCon praguehernanibf
 
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamFrom Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamAndreas Grabner
 
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"Daniel Bryant
 
Rails Performance Tricks and Treats
Rails Performance Tricks and TreatsRails Performance Tricks and Treats
Rails Performance Tricks and TreatsMarshall Yount
 
The Case for HTTP/2 - Internetdagarna 2015 - Stockholm
The Case for HTTP/2  - Internetdagarna 2015 - StockholmThe Case for HTTP/2  - Internetdagarna 2015 - Stockholm
The Case for HTTP/2 - Internetdagarna 2015 - StockholmAndy Davies
 
How to Contribute to Apache Usergrid
How to Contribute to Apache UsergridHow to Contribute to Apache Usergrid
How to Contribute to Apache UsergridDavid M. Johnson
 
Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)DECK36
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an examplehadooparchbook
 
Sysdig Monitorama Slides
Sysdig Monitorama SlidesSysdig Monitorama Slides
Sysdig Monitorama SlidesLoris Degioanni
 
Bring-your-ML-Project-into-Production-v2.pdf
Bring-your-ML-Project-into-Production-v2.pdfBring-your-ML-Project-into-Production-v2.pdf
Bring-your-ML-Project-into-Production-v2.pdfLiang Yan
 
Google App Engine Java, Groovy and Gaelyk
Google App Engine Java, Groovy and GaelykGoogle App Engine Java, Groovy and Gaelyk
Google App Engine Java, Groovy and GaelykGuillaume Laforge
 

Similar to Building Rackspace Cloud Monitoring (20)

Mobile Web Speed Bumps
Mobile Web Speed BumpsMobile Web Speed Bumps
Mobile Web Speed Bumps
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
 
How Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses CassandraHow Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses Cassandra
 
JRuby, Ruby, Rails and You on the Cloud
JRuby, Ruby, Rails and You on the CloudJRuby, Ruby, Rails and You on the Cloud
JRuby, Ruby, Rails and You on the Cloud
 
Using ArcGIS Server with Ruby on Rails
Using ArcGIS Server with Ruby on RailsUsing ArcGIS Server with Ruby on Rails
Using ArcGIS Server with Ruby on Rails
 
Http/2 - What's it all about?
Http/2  - What's it all about?Http/2  - What's it all about?
Http/2 - What's it all about?
 
D Baker - Galaxy Update
D Baker - Galaxy UpdateD Baker - Galaxy Update
D Baker - Galaxy Update
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 
Fix me if you can - DrupalCon prague
Fix me if you can - DrupalCon pragueFix me if you can - DrupalCon prague
Fix me if you can - DrupalCon prague
 
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamFrom Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
 
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
 
An API Your Parents Would Be Proud Of
An API Your Parents Would Be Proud OfAn API Your Parents Would Be Proud Of
An API Your Parents Would Be Proud Of
 
Rails Performance Tricks and Treats
Rails Performance Tricks and TreatsRails Performance Tricks and Treats
Rails Performance Tricks and Treats
 
The Case for HTTP/2 - Internetdagarna 2015 - Stockholm
The Case for HTTP/2  - Internetdagarna 2015 - StockholmThe Case for HTTP/2  - Internetdagarna 2015 - Stockholm
The Case for HTTP/2 - Internetdagarna 2015 - Stockholm
 
How to Contribute to Apache Usergrid
How to Contribute to Apache UsergridHow to Contribute to Apache Usergrid
How to Contribute to Apache Usergrid
 
Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
 
Sysdig Monitorama Slides
Sysdig Monitorama SlidesSysdig Monitorama Slides
Sysdig Monitorama Slides
 
Bring-your-ML-Project-into-Production-v2.pdf
Bring-your-ML-Project-into-Production-v2.pdfBring-your-ML-Project-into-Production-v2.pdf
Bring-your-ML-Project-into-Production-v2.pdf
 
Google App Engine Java, Groovy and Gaelyk
Google App Engine Java, Groovy and GaelykGoogle App Engine Java, Groovy and Gaelyk
Google App Engine Java, Groovy and Gaelyk
 

More from gdusbabek

My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015gdusbabek
 
How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014gdusbabek
 
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014gdusbabek
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014gdusbabek
 
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013gdusbabek
 
Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013gdusbabek
 
Rackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCRackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCgdusbabek
 
Breaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL DatastoresBreaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL Datastoresgdusbabek
 
Cassandra Codebase 2011
Cassandra Codebase 2011Cassandra Codebase 2011
Cassandra Codebase 2011gdusbabek
 
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column FamiliesData Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Familiesgdusbabek
 
Getting to Know the Cassandra Codebase
Getting to Know the Cassandra CodebaseGetting to Know the Cassandra Codebase
Getting to Know the Cassandra Codebasegdusbabek
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)gdusbabek
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGgdusbabek
 

More from gdusbabek (13)

My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
 
How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014
 
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014
 
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013
 
Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013
 
Rackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCRackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYC
 
Breaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL DatastoresBreaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL Datastores
 
Cassandra Codebase 2011
Cassandra Codebase 2011Cassandra Codebase 2011
Cassandra Codebase 2011
 
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column FamiliesData Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
 
Getting to Know the Cassandra Codebase
Getting to Know the Cassandra CodebaseGetting to Know the Cassandra Codebase
Getting to Know the Cassandra Codebase
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUG
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Building Rackspace Cloud Monitoring

  • 1. Monitoring @ Scale How Rackspace does it. Gary Dusbabek
  • 2. Outline 1. Our Problems 2. Other Requirements (Hopes and Dreams) 3. How we did it
  • 4. • Many tens of thousands of servers • Several solutions in place – Merged segments – Wouldn’t scale to all – Tedious (Manpower) – Expensive • Wanted more
  • 6. ExternalUsers No Special Cases Internal Users
  • 7. Know about problems Before customers do.
  • 8. Gain Insight Trending Capacity Planning More
  • 9. MONITORING Throwing away data Throwing away knowledge Y U NO BIG DATA?
  • 10. Availability Component Failure
  • 11. Availability Datacenter Failure
  • 16. API endpoints Other services that write to Cassandra node-whiskey˚ node-cassandra-client˚ node-swiz˚ node-elementtree˚
  • 17. C projects Reconnoiter – noitdˆ – stratcondˆ Scribe
  • 18. Java projects Mainly for concurrency Apache Cassandraˆ Apache Zookeeper Apache Thriftˆ Metric Ingestion Complex events (Esper)
  • 19. A Tale of Two Clusters Used Differently Control Cluster Data Cluster
  • 20. Data Model Relational Mismatch Our Approach One row per account Prefixed Ids for col names Concatenate for parent/child
  • 21. Data Model Foo objects get ids like ‘foXXX’ Bar objects get ids like ‘baXXX’ Assume 1:* relationship from foo:bar Bar 456 that is attached to Foo 123 gets column name ‘fo123:ba456’ Select ‘fo123:ba’..’fo123:baxefxbfxfb’
  • 22. Lua/Luvitˆ Virgo˚ (host agent) https://github.com/racker/virgo Vagrant/Chef Deployment Development
  • 23. Open Source Contribution Dreadnot˚ Inspired by Deployinator Multi-region deployment tool https://github.com/racker/dreadnot Blog Post - http://bit.ly/xks0qT
  • 25. Supportability Dashboard KPI Metrics Logging Dump/Load (ops tools)
  • 26. Automation • Testing continuously on buildbot Deployment • Dreadnot (bb + chef) • Constant deployment and self-monitoring (nagios + graphite) • Single-region upgrades
  • 27. Documentation Written by programmers Examples generated from tests
  • 28. Thanks! Private Beta https://cmbeta.api.rackspacecloud.com/ @gdusbabek https://github.com/racker
  • 29. Image Credits Scale http://www.flickr.com/photos/puuikibeach/4765115333 Pyramids http://www.flickr.com/photos/gracewong/93631410 Hercules http://www.flickr.com/photos/istolethetv/2203377554/ Dandelion http://www.flickr.com/photos/8047705@N02/5572197407/ Ear http://www.flickr.com/photos/perpetualplum/3974880498 Hourglass http://www.flickr.com/photos/22244945@N00/3278869535 Eyeball http://www.flickr.com/photos/miran/6567911705 few gears http://www.flickr.com/photos/arthurjohnpicton/5364226117 Many Gears http://www.flickr.com/photos/mwichary/2294174641 Tools http://www.flickr.com/photos/zzpza/3269784239/ Crane http://www.flickr.com/photos/katatoniq/2075966238/ Ants http://www.flickr.com/photos/dendroica/6170146527 Choice http://www.flickr.com/photos/-bast-/349497988 Old Car http://www.flickr.com/photos/dok1/353601845/ Monster Truck http://www.flickr.com/photos/beadmobile/3279378483/ Clusters http://www.flickr.com/photos/nanagyei/6318995952/ Model http://www.flickr.com/photos/inl/5097547405/ Agent http://www.flickr.com/photos/erix/191965832/ Chef http://www.flickr.com/photos/londonmatt/417683733/ Arrows http://www.flickr.com/photos/generated/2084287794/ Dashboard http://www.flickr.com/photos/80502454@N00/4172458435/ Legos http://www.flickr.com/photos/great8/6820722517/ Xray http://www.flickr.com/photos/karen_roe/4417259305/ Document http://www.flickr.com/photos/tusnelda/6140792529/ Fowers http://www.flickr.com/photos/petercastleton/5905455717/

Editor's Notes

  1. Problems = Specifically what we’re trying to solveHopes/Dreams = Things that would be really nice.How We Did It = Processes & tools
  2. As business has changed/segments were merged, others were added.Different teamsNone of the existing solutions will scale to meet the needs of everyone.Not harnessing the data.
  3. Different needs.Internal – hook into ticketing and support.External – monitor anythingOne API for everybody.
  4. Obvious reallyProactive
  5. TrendingCapacity planningAuto scaling
  6. ToolingAPI Driven – Self-serviceAwesome reporting
  7. Open sourceDistributed systemsEmbeddedDevOps
  8. Prefer open source
  9. Polyglot system
  10. Control:Metadata + stateHigh high replication (5 DC)Wide rowsEasy dump and loadData:ArchivalConstant writes3 DC replication
  11. Crud operationsData descriptionSimple parent-child – Simplified relationsPrefixed IDsObject called ‘foo’, id becomes ‘foXXXXXXXX’.Object called ‘bar’, id becomes ‘baXXXXXXX’.1:* relationship from foo:bar, then concatenate column names: foXXXXXXXX:baXXXXXXXXSelect all the bar that belong to fo12345678 becomes a slice from fo12345678:ba..fo12345678:ba\\xef\\xbf\\xbfOne to many from Objects are slices.
  12. Crud operationsData descriptionSimple parent-child – Simplified relationsPrefixed IDsObject called ‘foo’, id becomes ‘foXXXXXXXX’.Object called ‘bar’, id becomes ‘baXXXXXXX’.1:* relationship from foo:bar, then concatenate column names: foXXXXXXXX:baXXXXXXXXSelect all the bar that belong to fo12345678 becomes a slice from fo12345678:ba..fo12345678:ba\\xef\\xbf\\xbfOne to many from Objects are slices.
  13. Stress VISIBILITY!
  14. Deployed 120 times in JanuarySystem changes treated the same way as code changes.