SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
Redundancy Doesn't Always
           Mean "HA" or "Cluster"
            A cautionary tale against using hammers to solve all redundancy and resiliency problems ...

            OpenStack Design Summit – Oct 2012



             Randy Bias                                                                        Dan Sneddon
             @randybias                                                                        @dxs
             CTO, Cloudscaling                                                                 Sr. Engineer, Cloudscaling


                      CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution*
                                                * All unlicensed or borrowed works retain their original licenses   1
Thursday, October 18, 12
Our Journey Today


      1. “HA” pairs are not the only type of redundancy

      2. Alternative redundancy patterns for HA

      3. Redundancy patterns in Open Cloud System*




        * Cloudscaling’s OpenStack-powered cloud operating system (“distribution”)

                                                                         2
Thursday, October 18, 12
What Do We Mean By “HA”?
      We mean what most people mean ...




                   Two servers or network devices that look like one

                                                         3
Thursday, October 18, 12
“HA HA”?
      HA pairs come in a couple flavors




                            Active / Passive

                                               4
Thursday, October 18, 12
“HA HA”?
      People like this flavor best, but it’s not always possible...




                               Active / Active

                                                      5
Thursday, October 18, 12
“HA HA HA HA HA”??
      Many people wish they could get it more like this ...




                           HA cluster aka ‘massive operational nightmare’

                                                              6
Thursday, October 18, 12
Cluster<bleep>!
      Imagine this was 4 or 6 nodes in the cluster




• 4 network tech.
• 7 NICs / node
• A million different ways
      to break




                                                     7
Thursday, October 18, 12
“HA” Pairs Are One Type of Redundancy
      Herein lies the problem ...




                                    8
Thursday, October 18, 12
The Problem With “HA”-mmers
      There are many, but these two matter most ...



      • Catastrophic failures
      • No scale out



                                            9
Thursday, October 18, 12
HA Pairs Have Binary Failures
      Either working or dead, nothing in-between




                                           10
Thursday, October 18, 12
What is Scale-out?


                           A   B



                                   A   B    C     D        N




                           A   B




      Scale-up - Make boxes        Scale-out - Make moar
    bigger (usually an HA pair)            boxes


                                             11
Thursday, October 18, 12
Scaling out is a mindset
      Scaling up is like treating your servers as pets




     bowzer.company.com                     web001.company.com


                           Servers *are* cattle
                                                   12
Thursday, October 18, 12
HA Pair Failures* - 100% down
      Hardware rarely fails, operators fail, software fails
                     Who        Type            Year            Why           Duration
                      Apple     Switch           2005            Bug             2 hrs

                 Flexiscale      SAN             2007          Ops Err          24 hrs

                     Vendio      NAS             2008          Ops Err           8 hrs

                UOL Brazil       SAN             2011            Bug            72 hrs

                    Twitter   Datacenter         2012         Bug+Ops            2 hrs

           * This is a handful of examples as a baseline; I’m sure you can find many more

                                                                         13
Thursday, October 18, 12
“HA” Pairs Are an All-in Move
      They better not fail ...




                                 14
Thursday, October 18, 12
Risk Reduction
      Many small failure domains is usually better




                                            15
Thursday, October 18, 12
Big failure domains vs. small
      Would you rather have the whole cloud down or just a
      small bit for a short period of time?




                                     Still a scale-up pattern ...
                                   wouldn’t you rather scale-out?
                                                16
Thursday, October 18, 12
Pair vs. Scale-out Load Balancing
      No scale-out




           State Sync      Shared-nothing Architecture

         (100% loss)               (20% loss)

                                          17
Thursday, October 18, 12
Pair vs. Scale-out Load Balancing
      No scale-out




           State Sync      Shared-nothing Architecture

         (100% loss)               (20% loss)

                                          17
Thursday, October 18, 12
What’s Usually an “HA” Pair in OpenStack?
      Everything ...



                 Service Endpoints     Messaging System
                       (APIs)               (RPC)


                    Worker Threads
                                          Database
                    (e.g. Scheduler,
                                          (MySQL)
                      Networking)

                                             18
Thursday, October 18, 12
What needs to be an HA pair?
      Not much needs state synchronization



                 Service Endpoints     Messaging System
                       (APIs)               (RPC)


                    Worker Threads
                                          Database
                    (e.g. Scheduler,
                      Networking)         (MySQL)

                                             19
Thursday, October 18, 12
Fault Tolerance Methodologies




                                      20
Thursday, October 18, 12
Fault Tolerance in OCS




                               21
Thursday, October 18, 12
Service Distribution
      High Availability Without Compromise




          Resilient        Stateless         Scale-out




                                             22
Thursday, October 18, 12
Service Distribution
      Combines Standard Networking Technologies
                                                    router ospf
         OSPF              /etc/quagga/ospfd.conf    ospf router-id 10.1.1.1
                                                     network 10.1.255.1 area 0.0.0.0


                                                    interface lo:2
         Anycast           /etc/quagga/zebra.conf    description Pound listening address
                                                     ip address 10.1.255.1/32


                                                    ListenHTTP
                                                        Address 10.1.255.1
                                                        Port 8774
         Load-                                          xHTTP
                                                        Service
                                                            BackEnd
                                                                    1


         Balancing         /etc/pound/pound.conf
                                                            End
                                                                Address 10.1.1.1
                                                                Port 8774

         Proxy                                              BackEnd
                                                                Address 10.1.1.2
                                                                Port 8774
                                                            End
                                                        End
                                                    End


                                                                       23
Thursday, October 18, 12
Resilient OpenStack
      Horizontally Scalable, No Single Point Of Failure

             Service Distribution          ZeroMQ

                 Service Endpoints     Messaging System
                       (APIs)               (RPC)

             Service Distribution         MMR + HA
                    Worker Threads        Database
                    (e.g. Scheduler,
                      Networking)         (MySQL)



Thursday, October 18, 12
Service Distribution Advantages
      What Makes This a Superior Solution?

      • True horizontal scalability with no centralized controller
      • Services are always running, failover is nearly instant
      • Reduced complexity, fewer idle resources
      • No need for separate load balancers


       Server              Server         Server   Server   Server   Server   Server
                                                                                       ...
                Failover            vs.             Distributed Services

                                                                      25
Thursday, October 18, 12
Perfect For Site Resiliency
      Service Distribution Works With Multiple Sites
        • Traditional HA pairs do not support cross-site resiliency

        • Service Distribution fail across sites without DNS redirections




                                                             26
Thursday, October 18, 12
Service Distribution in Action
        Example: Distributed Load Balancing
                 1)        OSPF

                                                      OSPF Router(s)




                                      OSPF                                    OSPF
                                  advertisement                           advertisement

                                                  V
                                         Quagga                         Quagga


                                         HTTP Proxy                    HTTP Proxy




                                                                        27
Thursday, October 18, 12
Service Distribution in Action
        Example: Distributed Load Balancing
                 1)        OSPF

                                                            OSPF Router(s)

                 2)        ECMP Per-flow
                           Load Balancing


                                            OSPF                                    OSPF
                                        advertisement                           advertisement
                                                                Per-Flow
                                                                  Load
                 3)        Load-balancing               V
                                                                Balancing
                                               Quagga                         Quagga
                           HTTP Proxy

                                               HTTP Proxy                    HTTP Proxy




                                                                              28
Thursday, October 18, 12
Service Distribution in Action
        Example: Distributed Load Balancing
                 1)        OSPF

                                                             OSPF Router(s)

                 2)        ECMP Per-flow
                           Load Balancing


                                             OSPF                                        OSPF
                                         advertisement                               advertisement
                                                                 Per-Flow
                                                                   Load
                 3)        Load-balancing                V
                                                                 Balancing
                                                Quagga                             Quagga
                           HTTP Proxy

                                                HTTP Proxy                        HTTP Proxy

                 4)        Unlimited #
                           of Back-End
                           Servers
                                                  Server     Server      Server     Server




                                                                                   29
Thursday, October 18, 12
Failure Resiliency
            Client                      Client                             Client               Client

    1                               2                                  3                    4




                           1                          2          3                              4
                      Load Balancer/
                       Load Balancer/      Load Balancer/        Load Balancer/     Load Balancer/
                           Proxy
                           Proxy               Proxy                 Proxy              Proxy

                                                                                                          10%
                           Server        Server             Server         Server           Server        Load
                                                                                                          Each
                           Server        Server             Server         Server           Server       Server

                                                                                       30
Thursday, October 18, 12
Failure Resiliency
            Client                      Client                             Client               Client

    1                               2                                  3                    4




                           1                      12             3                              4

                            X
                      Load Balancer/
                       Load Balancer/      Load Balancer/        Load Balancer/     Load Balancer/
                           Proxy
                           Proxy               Proxy                 Proxy              Proxy

                                                                                                          10%
                           Server        Server             Server         Server           Server        Load
                                                                                                          Each
                           Server        Server             Server         Server           Server       Server

                                                                                       31
Thursday, October 18, 12
Failure Resiliency
            Client                      Client                             Client               Client

    1                               2                                  3                    4




                           1                          2          3                              4
                      Load Balancer/
                       Load Balancer/      Load Balancer/        Load Balancer/     Load Balancer/
                           Proxy
                           Proxy               Proxy                 Proxy              Proxy

                                                                                                            10%

                           X
                           Server

                           Server
                                         Server

                                         Server
                                                            Server

                                                            Server
                                                                           Server

                                                                           Server
                                                                                            Server

                                                                                            Server
                                                                                                         Increased
                                                                                                           Server
                                                                                                            Load

                                                                                       32
Thursday, October 18, 12
OCS NAT Service
      Example: Scale-out Network Address Translation
             BGP                               Multiple ISP
                                               providers



             NAT



             Service
             Distribution



              VMs

                                          33
Thursday, October 18, 12
Brokerless Messaging With ZeroMQ
      Avoiding RabbitMQ’s Single Point Of Failure
                           Nova-Compute




                                        Single Point
                                        Of Failure



                             RabbitMQ
                              Broker



            Nova-Scheduler                  Nova-API

                            RabbitMQ
                           (Brokered)
                                                       34
Thursday, October 18, 12
Brokerless Messaging With ZeroMQ
      Avoiding RabbitMQ’s Single Point Of Failure
                           Nova-Compute                             Nova-Compute




                                        Single Point
                                        Of Failure



                             RabbitMQ
                              Broker



            Nova-Scheduler                  Nova-API     Nova-Scheduler            Nova-API

                            RabbitMQ                   vs.          ZeroMQ
                           (Brokered)                            (Peer To Peer)
                                                                          35
Thursday, October 18, 12
What did we learn today?


          1. HA-mmers are for nails

          2. Scale-out rules for redundancy

          3. Design-for-failure is a mentality, not a pair

          4. Resiliency over redundancy



                                                      36
Thursday, October 18, 12
Q&A
 Randy Bias                                         Dan Sneddon
 @randybias                                         @dxs
 CTO, Cloudscaling                                  Sr. Engineer, Cloudscaling



                                                      OCS 2.0
                      Public Cloud Benefits | Private Cloud Control | Open Cloud Economics

                                                                             37
Thursday, October 18, 12

Mais conteúdo relacionado

Destaque

Lecture 04 normalization
Lecture 04 normalization Lecture 04 normalization
Lecture 04 normalization emailharmeet
 
Databases: Locking Methods
Databases: Locking MethodsDatabases: Locking Methods
Databases: Locking MethodsDamian T. Gordon
 
physical and logical data independence
physical and logical data independencephysical and logical data independence
physical and logical data independenceapoorva_upadhyay
 
Architecture of-dbms-and-data-independence
Architecture of-dbms-and-data-independenceArchitecture of-dbms-and-data-independence
Architecture of-dbms-and-data-independenceAnuj Modi
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashingJeet Poria
 
Systems Analyst and Design - Data Dictionary
Systems Analyst and Design -  Data DictionarySystems Analyst and Design -  Data Dictionary
Systems Analyst and Design - Data DictionaryKimberly Coquilla
 
CBSE XII Database Concepts And MySQL Presentation
CBSE XII Database Concepts And MySQL PresentationCBSE XII Database Concepts And MySQL Presentation
CBSE XII Database Concepts And MySQL PresentationGuru Ji
 
Types of databases
Types of databasesTypes of databases
Types of databasesPAQUIAAIZEL
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Jargalsaikhan Alyeksandr
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법 홍배 김
 

Destaque (15)

DISE - Database Concepts
DISE - Database ConceptsDISE - Database Concepts
DISE - Database Concepts
 
Lecture 04 normalization
Lecture 04 normalization Lecture 04 normalization
Lecture 04 normalization
 
Data Dictionary
Data DictionaryData Dictionary
Data Dictionary
 
Databases: Locking Methods
Databases: Locking MethodsDatabases: Locking Methods
Databases: Locking Methods
 
physical and logical data independence
physical and logical data independencephysical and logical data independence
physical and logical data independence
 
Architecture of-dbms-and-data-independence
Architecture of-dbms-and-data-independenceArchitecture of-dbms-and-data-independence
Architecture of-dbms-and-data-independence
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
 
Data dictionary
Data dictionaryData dictionary
Data dictionary
 
Systems Analyst and Design - Data Dictionary
Systems Analyst and Design -  Data DictionarySystems Analyst and Design -  Data Dictionary
Systems Analyst and Design - Data Dictionary
 
SAP ABAP data dictionary
SAP ABAP data dictionarySAP ABAP data dictionary
SAP ABAP data dictionary
 
CBSE XII Database Concepts And MySQL Presentation
CBSE XII Database Concepts And MySQL PresentationCBSE XII Database Concepts And MySQL Presentation
CBSE XII Database Concepts And MySQL Presentation
 
Types of databases
Types of databasesTypes of databases
Types of databases
 
DBMS - Normalization
DBMS - NormalizationDBMS - Normalization
DBMS - Normalization
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법
 

Semelhante a OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdf

Considerations for Building Your Private Cloud.pdf
Considerations for Building Your Private Cloud.pdfConsiderations for Building Your Private Cloud.pdf
Considerations for Building Your Private Cloud.pdfOpenStack Foundation
 
Cloud Tech III: Actionable Metrics
Cloud Tech III: Actionable MetricsCloud Tech III: Actionable Metrics
Cloud Tech III: Actionable Metricsroyrapoport
 
Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsAndy Gross
 
Erlang for video delivery
Erlang for video deliveryErlang for video delivery
Erlang for video deliveryHugh Watkins
 
Apache Hadoop Talk at QCon
Apache Hadoop Talk at QConApache Hadoop Talk at QCon
Apache Hadoop Talk at QConCloudera, Inc.
 
Secrets of the asset pipeline
Secrets of the asset pipelineSecrets of the asset pipeline
Secrets of the asset pipelineKen Collins
 
Operating your OpenStack Private Cloud.pdf
Operating your OpenStack Private Cloud.pdfOperating your OpenStack Private Cloud.pdf
Operating your OpenStack Private Cloud.pdfOpenStack Foundation
 
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...NETWAYS
 
MySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Brasil
 
Abstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at TwitterAbstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at TwitterLeonidas Tsementzis
 
DreamObjects - Ceph Day Nov 2012
DreamObjects - Ceph Day Nov 2012DreamObjects - Ceph Day Nov 2012
DreamObjects - Ceph Day Nov 2012Ceph Community
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseArangoDB Database
 
Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Kellan
 
The computer science behind a modern disributed data store
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data storeJ On The Beach
 
Why Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelWhy Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelDean Wampler
 
Inside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private CloudInside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private CloudAtlassian
 
WordPress: Performance Optimization and Scaling - WordCamp Las Vegas 2011
WordPress: Performance Optimization and Scaling - WordCamp Las Vegas 2011WordPress: Performance Optimization and Scaling - WordCamp Las Vegas 2011
WordPress: Performance Optimization and Scaling - WordCamp Las Vegas 2011Matt Martz
 
Riak intro to..
Riak intro to..Riak intro to..
Riak intro to..Adron Hall
 

Semelhante a OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdf (20)

Considerations for Building Your Private Cloud.pdf
Considerations for Building Your Private Cloud.pdfConsiderations for Building Your Private Cloud.pdf
Considerations for Building Your Private Cloud.pdf
 
Cloud Tech III: Actionable Metrics
Cloud Tech III: Actionable MetricsCloud Tech III: Actionable Metrics
Cloud Tech III: Actionable Metrics
 
Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard Problems
 
NoSQL @ Qbranch -2010-04-15
NoSQL @ Qbranch -2010-04-15NoSQL @ Qbranch -2010-04-15
NoSQL @ Qbranch -2010-04-15
 
Erlang for video delivery
Erlang for video deliveryErlang for video delivery
Erlang for video delivery
 
Apache Hadoop Talk at QCon
Apache Hadoop Talk at QConApache Hadoop Talk at QCon
Apache Hadoop Talk at QCon
 
Secrets of the asset pipeline
Secrets of the asset pipelineSecrets of the asset pipeline
Secrets of the asset pipeline
 
Operating your OpenStack Private Cloud.pdf
Operating your OpenStack Private Cloud.pdfOperating your OpenStack Private Cloud.pdf
Operating your OpenStack Private Cloud.pdf
 
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
 
MySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Cluster no PayPal
MySQL Cluster no PayPal
 
Abstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at TwitterAbstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at Twitter
 
DreamObjects - Ceph Day Nov 2012
DreamObjects - Ceph Day Nov 2012DreamObjects - Ceph Day Nov 2012
DreamObjects - Ceph Day Nov 2012
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed Database
 
Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012
 
The computer science behind a modern disributed data store
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data store
 
Big Data Overview
Big Data OverviewBig Data Overview
Big Data Overview
 
Why Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelWhy Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) Model
 
Inside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private CloudInside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private Cloud
 
WordPress: Performance Optimization and Scaling - WordCamp Las Vegas 2011
WordPress: Performance Optimization and Scaling - WordCamp Las Vegas 2011WordPress: Performance Optimization and Scaling - WordCamp Las Vegas 2011
WordPress: Performance Optimization and Scaling - WordCamp Las Vegas 2011
 
Riak intro to..
Riak intro to..Riak intro to..
Riak intro to..
 

Mais de OpenStack Foundation

Sponsor Webinar - OpenStack Summit Vancouver 2018
Sponsor Webinar  - OpenStack Summit Vancouver 2018Sponsor Webinar  - OpenStack Summit Vancouver 2018
Sponsor Webinar - OpenStack Summit Vancouver 2018OpenStack Foundation
 
OpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For AttendeesOpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For AttendeesOpenStack Foundation
 
OpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community PresentationOpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community PresentationOpenStack Foundation
 
OpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group PartiesOpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group PartiesOpenStack Foundation
 
Liberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messagesLiberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messagesOpenStack Foundation
 
OpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing PlanOpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing PlanOpenStack Foundation
 
OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar OpenStack Foundation
 
Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition OpenStack Foundation
 
Searchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionSearchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionOpenStack Foundation
 
Congress Updates - Liberty Edition
Congress Updates - Liberty EditionCongress Updates - Liberty Edition
Congress Updates - Liberty EditionOpenStack Foundation
 
Release Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty EditionRelease Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty EditionOpenStack Foundation
 
OpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use CasesOpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use CasesOpenStack Foundation
 

Mais de OpenStack Foundation (20)

Sponsor Webinar - OpenStack Summit Vancouver 2018
Sponsor Webinar  - OpenStack Summit Vancouver 2018Sponsor Webinar  - OpenStack Summit Vancouver 2018
Sponsor Webinar - OpenStack Summit Vancouver 2018
 
OpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For AttendeesOpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For Attendees
 
OpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community PresentationOpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community Presentation
 
OpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group PartiesOpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group Parties
 
Liberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messagesLiberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messages
 
OpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing PlanOpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing Plan
 
OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar
 
Cinder Updates - Liberty Edition
Cinder Updates - Liberty Edition Cinder Updates - Liberty Edition
Cinder Updates - Liberty Edition
 
Glance Updates - Liberty Edition
Glance Updates - Liberty EditionGlance Updates - Liberty Edition
Glance Updates - Liberty Edition
 
Heat Updates - Liberty Edition
Heat Updates - Liberty EditionHeat Updates - Liberty Edition
Heat Updates - Liberty Edition
 
Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition
 
Nova Updates - Liberty Edition
Nova Updates - Liberty EditionNova Updates - Liberty Edition
Nova Updates - Liberty Edition
 
Sahara Updates - Liberty Edition
Sahara Updates - Liberty EditionSahara Updates - Liberty Edition
Sahara Updates - Liberty Edition
 
Searchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionSearchlight Updates - Liberty Edition
Searchlight Updates - Liberty Edition
 
Trove Updates - Liberty Edition
Trove Updates - Liberty EditionTrove Updates - Liberty Edition
Trove Updates - Liberty Edition
 
OpenStack: five years in
OpenStack: five years inOpenStack: five years in
OpenStack: five years in
 
Swift Updates - Liberty Edition
Swift Updates - Liberty EditionSwift Updates - Liberty Edition
Swift Updates - Liberty Edition
 
Congress Updates - Liberty Edition
Congress Updates - Liberty EditionCongress Updates - Liberty Edition
Congress Updates - Liberty Edition
 
Release Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty EditionRelease Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty Edition
 
OpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use CasesOpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use Cases
 

OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdf

  • 1. Redundancy Doesn't Always Mean "HA" or "Cluster" A cautionary tale against using hammers to solve all redundancy and resiliency problems ... OpenStack Design Summit – Oct 2012 Randy Bias Dan Sneddon @randybias @dxs CTO, Cloudscaling Sr. Engineer, Cloudscaling CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution* * All unlicensed or borrowed works retain their original licenses 1 Thursday, October 18, 12
  • 2. Our Journey Today 1. “HA” pairs are not the only type of redundancy 2. Alternative redundancy patterns for HA 3. Redundancy patterns in Open Cloud System* * Cloudscaling’s OpenStack-powered cloud operating system (“distribution”) 2 Thursday, October 18, 12
  • 3. What Do We Mean By “HA”? We mean what most people mean ... Two servers or network devices that look like one 3 Thursday, October 18, 12
  • 4. “HA HA”? HA pairs come in a couple flavors Active / Passive 4 Thursday, October 18, 12
  • 5. “HA HA”? People like this flavor best, but it’s not always possible... Active / Active 5 Thursday, October 18, 12
  • 6. “HA HA HA HA HA”?? Many people wish they could get it more like this ... HA cluster aka ‘massive operational nightmare’ 6 Thursday, October 18, 12
  • 7. Cluster<bleep>! Imagine this was 4 or 6 nodes in the cluster • 4 network tech. • 7 NICs / node • A million different ways to break 7 Thursday, October 18, 12
  • 8. “HA” Pairs Are One Type of Redundancy Herein lies the problem ... 8 Thursday, October 18, 12
  • 9. The Problem With “HA”-mmers There are many, but these two matter most ... • Catastrophic failures • No scale out 9 Thursday, October 18, 12
  • 10. HA Pairs Have Binary Failures Either working or dead, nothing in-between 10 Thursday, October 18, 12
  • 11. What is Scale-out? A B A B C D N A B Scale-up - Make boxes Scale-out - Make moar bigger (usually an HA pair) boxes 11 Thursday, October 18, 12
  • 12. Scaling out is a mindset Scaling up is like treating your servers as pets bowzer.company.com web001.company.com Servers *are* cattle 12 Thursday, October 18, 12
  • 13. HA Pair Failures* - 100% down Hardware rarely fails, operators fail, software fails Who Type Year Why Duration Apple Switch 2005 Bug 2 hrs Flexiscale SAN 2007 Ops Err 24 hrs Vendio NAS 2008 Ops Err 8 hrs UOL Brazil SAN 2011 Bug 72 hrs Twitter Datacenter 2012 Bug+Ops 2 hrs * This is a handful of examples as a baseline; I’m sure you can find many more 13 Thursday, October 18, 12
  • 14. “HA” Pairs Are an All-in Move They better not fail ... 14 Thursday, October 18, 12
  • 15. Risk Reduction Many small failure domains is usually better 15 Thursday, October 18, 12
  • 16. Big failure domains vs. small Would you rather have the whole cloud down or just a small bit for a short period of time? Still a scale-up pattern ... wouldn’t you rather scale-out? 16 Thursday, October 18, 12
  • 17. Pair vs. Scale-out Load Balancing No scale-out State Sync Shared-nothing Architecture (100% loss) (20% loss) 17 Thursday, October 18, 12
  • 18. Pair vs. Scale-out Load Balancing No scale-out State Sync Shared-nothing Architecture (100% loss) (20% loss) 17 Thursday, October 18, 12
  • 19. What’s Usually an “HA” Pair in OpenStack? Everything ... Service Endpoints Messaging System (APIs) (RPC) Worker Threads Database (e.g. Scheduler, (MySQL) Networking) 18 Thursday, October 18, 12
  • 20. What needs to be an HA pair? Not much needs state synchronization Service Endpoints Messaging System (APIs) (RPC) Worker Threads Database (e.g. Scheduler, Networking) (MySQL) 19 Thursday, October 18, 12
  • 21. Fault Tolerance Methodologies 20 Thursday, October 18, 12
  • 22. Fault Tolerance in OCS 21 Thursday, October 18, 12
  • 23. Service Distribution High Availability Without Compromise Resilient Stateless Scale-out 22 Thursday, October 18, 12
  • 24. Service Distribution Combines Standard Networking Technologies router ospf OSPF /etc/quagga/ospfd.conf ospf router-id 10.1.1.1 network 10.1.255.1 area 0.0.0.0 interface lo:2 Anycast /etc/quagga/zebra.conf description Pound listening address ip address 10.1.255.1/32 ListenHTTP Address 10.1.255.1 Port 8774 Load- xHTTP Service BackEnd 1 Balancing /etc/pound/pound.conf End Address 10.1.1.1 Port 8774 Proxy BackEnd Address 10.1.1.2 Port 8774 End End End 23 Thursday, October 18, 12
  • 25. Resilient OpenStack Horizontally Scalable, No Single Point Of Failure Service Distribution ZeroMQ Service Endpoints Messaging System (APIs) (RPC) Service Distribution MMR + HA Worker Threads Database (e.g. Scheduler, Networking) (MySQL) Thursday, October 18, 12
  • 26. Service Distribution Advantages What Makes This a Superior Solution? • True horizontal scalability with no centralized controller • Services are always running, failover is nearly instant • Reduced complexity, fewer idle resources • No need for separate load balancers Server Server Server Server Server Server Server ... Failover vs. Distributed Services 25 Thursday, October 18, 12
  • 27. Perfect For Site Resiliency Service Distribution Works With Multiple Sites • Traditional HA pairs do not support cross-site resiliency • Service Distribution fail across sites without DNS redirections 26 Thursday, October 18, 12
  • 28. Service Distribution in Action Example: Distributed Load Balancing 1) OSPF OSPF Router(s) OSPF OSPF advertisement advertisement V Quagga Quagga HTTP Proxy HTTP Proxy 27 Thursday, October 18, 12
  • 29. Service Distribution in Action Example: Distributed Load Balancing 1) OSPF OSPF Router(s) 2) ECMP Per-flow Load Balancing OSPF OSPF advertisement advertisement Per-Flow Load 3) Load-balancing V Balancing Quagga Quagga HTTP Proxy HTTP Proxy HTTP Proxy 28 Thursday, October 18, 12
  • 30. Service Distribution in Action Example: Distributed Load Balancing 1) OSPF OSPF Router(s) 2) ECMP Per-flow Load Balancing OSPF OSPF advertisement advertisement Per-Flow Load 3) Load-balancing V Balancing Quagga Quagga HTTP Proxy HTTP Proxy HTTP Proxy 4) Unlimited # of Back-End Servers Server Server Server Server 29 Thursday, October 18, 12
  • 31. Failure Resiliency Client Client Client Client 1 2 3 4 1 2 3 4 Load Balancer/ Load Balancer/ Load Balancer/ Load Balancer/ Load Balancer/ Proxy Proxy Proxy Proxy Proxy 10% Server Server Server Server Server Load Each Server Server Server Server Server Server 30 Thursday, October 18, 12
  • 32. Failure Resiliency Client Client Client Client 1 2 3 4 1 12 3 4 X Load Balancer/ Load Balancer/ Load Balancer/ Load Balancer/ Load Balancer/ Proxy Proxy Proxy Proxy Proxy 10% Server Server Server Server Server Load Each Server Server Server Server Server Server 31 Thursday, October 18, 12
  • 33. Failure Resiliency Client Client Client Client 1 2 3 4 1 2 3 4 Load Balancer/ Load Balancer/ Load Balancer/ Load Balancer/ Load Balancer/ Proxy Proxy Proxy Proxy Proxy 10% X Server Server Server Server Server Server Server Server Server Server Increased Server Load 32 Thursday, October 18, 12
  • 34. OCS NAT Service Example: Scale-out Network Address Translation BGP Multiple ISP providers NAT Service Distribution VMs 33 Thursday, October 18, 12
  • 35. Brokerless Messaging With ZeroMQ Avoiding RabbitMQ’s Single Point Of Failure Nova-Compute Single Point Of Failure RabbitMQ Broker Nova-Scheduler Nova-API RabbitMQ (Brokered) 34 Thursday, October 18, 12
  • 36. Brokerless Messaging With ZeroMQ Avoiding RabbitMQ’s Single Point Of Failure Nova-Compute Nova-Compute Single Point Of Failure RabbitMQ Broker Nova-Scheduler Nova-API Nova-Scheduler Nova-API RabbitMQ vs. ZeroMQ (Brokered) (Peer To Peer) 35 Thursday, October 18, 12
  • 37. What did we learn today? 1. HA-mmers are for nails 2. Scale-out rules for redundancy 3. Design-for-failure is a mentality, not a pair 4. Resiliency over redundancy 36 Thursday, October 18, 12
  • 38. Q&A Randy Bias Dan Sneddon @randybias @dxs CTO, Cloudscaling Sr. Engineer, Cloudscaling OCS 2.0 Public Cloud Benefits | Private Cloud Control | Open Cloud Economics 37 Thursday, October 18, 12