SlideShare uma empresa Scribd logo
1 de 66
Baixar para ler offline
Beyond The Numbers
         Baron Schwartz
Who Am I?

            ●   baron@percona.com
            ●   @xaprb
            ●   linkedin.com/in/xaprb
            ●   xaprb.com/blog
Who Am I?

●   Maatkit                ●   Percona Toolkit
●   Innotop                ●   Monitoring Plugins
●   Aspersa                ●   Online Tools
●   JavaScript Libraries
●   Consulting      ●   Percona Server
●   Support         ●   Percona XtraBackup
●   Remote DBA      ●   Percona XtraDB
                        Cluster
●   Engineering
                    ●   Percona Toolkit
●   Conferences &
    Training        ●   Many More
Today's Agenda

●   Benchmarks
●   Aggregation and Distributions
●   Performance, Capacity & Utilization
●   Rules of Thumb
●   Queueing Theory and Scalability
Benchmarks
What's Missing?

                  ●   Distribution
                  ●   Time Series
                  ●   Response Times
                  ●   Parameters
                  ●   Goals
                  ●   System Specs
What's Misleading?

                 ●   Logarithmic X-Axis
                 ●   Interpolation
What's Good?

               ●   Y-Axis Reaches 0
               ●   No Fake-Smoothing
Behind a Single Dot
Look At All That Data...
What's With The Grid Lines?!?!?
Better Benchmarks




 What does an ideal benchmark report look like?
Clear Benchmark Goals

●   Validating hardware configuration
●   Comparing two systems
●   Checking for regressions
●   Capacity planning
●   Reproducing bad behavior to solve it
●   Stress-testing to find bottlenecks
Hardware and Software

●   Specs for CPU, disk, memory, network
●   Software versions (OS, SUT, benchmark)
●   Filesystem, RAID controller
●   Disk queue scheduler
Presenting Results

●   Ideally, make raw results available
●   Include metrics from OS (CPU, RAM, IO,
    network)
●   Generate some plots to summarize
    ●   This is where the rubber meets the road!
Better Aggregate Measures

●   Average
●   Percentiles
    ●   95th
    ●   99th
●   Maximum
●   Observation Duration
    ●   Question: how bad can 95th percentile be?
More Aggregate Measures

●   Median (50th Percentile)
●   Standard Deviation
●   Index of Dispersion
Better...
Better Still...
Keep It Coming...
Throughput AND Response Time
Performance

●   What is Performance?
●   Two Metrics
    ●   Response Time (time per task)
    ●   Throughput (tasks per time)
●   They're not reciprocals
    ●   More on this later
What Performance Isn't

●   CPU Usage
●   Load Average
●   Other metrics of resource consumption
Performance

●   I often focus on response time
    ●   It represents user experience
    ●   Throughput indicates capacity rather than
        performance
●   For benchmarking, throughput is primary
Utilization

●   The portion of time during which the
    resource is busy
    ●   i.e. there is at least one thing in progress
Utilization is Confusing

●   Be very careful with tools that report
    utilization
●   From the Linux iostat man page:
    ●   “%util: Percentage of CPU time during which
        I/O requests were issued to the device
        (bandwidth utilization for the device). Device
        saturation occurs when this value is close to
        100%.”
●   Can you parse that? Is it true?
Capacity

●   What is Capacity?
Capacity
Capacity – My Definition

 Capacity is the maximum throughput
 ... at achievable concurrency
 ... with acceptable performance
 ... as defined by response time
 ... meeting specified constraints
 ... over specified observation intervals.
Capacity Example

●   What is capacity of the system at a
    concurrency of 32 with 10-second 95th-
    percentile response time not to exceed
    2ms over a 60-minute duration?
●   To determine this, we need goal-seeking
    benchmark software
    ●   Most benchmark software can't do this
Benchmarks, etc Recap

●   Most benchmarks reveal very little
●   Benchmark reports reveal even less
●   It's good to go beyond the surface
Amdahl's Law

●   “The speedup of a program using multiple
    processors in parallel computing is limited
    by the time needed for the sequential
    fraction of the program.” - Wikipedia
●   It's basically a law of diminishing returns.
Should I Defragment My Disk?

●   Method 1: Google “defragment”
●   Method 2: Try it and see
●   Method 3: Measure if the disk is a
    bottleneck
Spolsky -vs- Millsap
Spolsky -vs- Millsap
Amdahl's Law

●   Don't try to optimize little things.
Little's Law

●   N = XR
●   That is,
    ●   Concurrency = Throughput * Response Time
●   This holds regardless of queueing, arrival
    rate distribution, response time
    distribution, etc.
Little's Law Example

●   If disk IOs average 4ms...
●   And there are 280 IOs per second...
●   Then the disk's average concurrency is:
    ●   N = 280 * .004
    ●   N = 1.12
●   Do you believe this?
    ●   When might it not be true?
Little's Law Example #2

●   If disk utilization is 98%
●   And there are 280 IOs per second
●   What do we know?
Utilization Law

●   U = SX
    ●   Also independent of distributions, etc...
●   That is,
    ●   Utilization = Service Time * Throughput
●   Utilization = 98% and Throughput = 280
    ●   S = U/X
    ●   Service Time = .98 / 280 = .0035
Queueing Theory

●   How can we predict the amount of
    queueing in a system?
●   How can we predict its response times?
●   How can we predict capacity?
Erlang Queueing

●   Erlang's formulas model the probability of
    queueing for a given arrival rate, service
    time, and number of servers.
●   A “server” is anything capable of serving
    a request.
    ●   CPUs
    ●   Disks
CPU -vs- Disk Queueing

●   Scenario: 4-CPU, 4-disk (RAID0) server
●   Thought experiment:
    ●   How do processes queue for CPU?
    ●   How do I/O requests queue on disks?
Notation

●   Typically see something like M/M/1
●   Each letter is a placeholder in A/S/n
    ●   A = Arrival distribution
    ●   S = Service-time distribution
    ●   n = Number of servers
●   A and S can be one of:
    ●   Markov
    ●   Deterministic
    ●   General
CPUs -vs- Disks

●   CPUs: M/M/4



●   Disks: 4 x {M/M/1}
M/M/1 Queueing




                 cmg.org
M/M/n Queueing




                 cmg.org
Erlang C Function

●   M/M/n queueing is modeled by Erlang C
    ●   See http://en.wikipedia.org/wiki/Erlang_(unit)
What's Wrong With Erlang C?

●   You must validate your arrival times.
●   You must validate your service times.
●   The equation is hard to work with.
●   In practice, it's hard to use Erlang C.
Scalability

●   Queueing causes non-linear scaling.
●   But first, let's talk about linearity.
System Scalability
Throughput




                         Why?




               Concurrency
Universal Scalability Law


                             Linear



                               Amdahl
Throughput




                                 USL




               Concurrency
Amdahl Scalability
USL Scalability
USL Scalability Modeling
USL Performance Modeling
Scalability Limitations

●   Locks
●   Synchronization points
●   Shared resources
●   Duplicated data to be kept in sync
●   Weakest-link problems
RAID10 On EBS

●   Which is faster?
    ●   RAID 10 over 10 EBS volumes
    ●   RAID 10 over 20 EBS volumes
●   Hint: http://goo.gl/Xm92Y
    ●   Also, http://goo.gl/fAEIL
Debunking “Linear”

●   Ask to see the actual numbers.
    ●   They shouldn't be rounded off suspiciously.
    ●   They must be truly linear.
    ●   They must intersect the point (0, 0).
Debunking, Example #1
Is it Linear?
It's Not Linear
Resources

●   Naomi Robbins' Blog
    ●   http://blogs.forbes.com/naomirobbins/
●   Percona White Papers
    ●   http://www.percona.com/
●   Neil J. Gunther
    ●   Guerrilla Capacity Planning
●   http://www.contextneeded.com/
Questions?
baron@percona.com
           @xaprb

Mais conteúdo relacionado

Mais procurados

Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...DataStax
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationDatabricks
 
Increasing Kafka Connect Throughput with Catalin Pop with Catalin Pop | Kafka...
Increasing Kafka Connect Throughput with Catalin Pop with Catalin Pop | Kafka...Increasing Kafka Connect Throughput with Catalin Pop with Catalin Pop | Kafka...
Increasing Kafka Connect Throughput with Catalin Pop with Catalin Pop | Kafka...HostedbyConfluent
 
LAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in AndroidLAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in AndroidLinaro
 
Coredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverCoredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverYann Hamon
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsdatamantra
 
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured DataBigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Dataelliando dias
 
Using ScyllaDB for Distribution of Game Assets in Unreal Engine
Using ScyllaDB for Distribution of Game Assets in Unreal EngineUsing ScyllaDB for Distribution of Game Assets in Unreal Engine
Using ScyllaDB for Distribution of Game Assets in Unreal EngineScyllaDB
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntuSim Janghoon
 
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)SolarWinds
 
BitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven rendererBitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven renderertobias_persson
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorAlmost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorJean-François Gagné
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSTomas Vondra
 
Multi tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafkaMulti tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafkaTodd Palino
 
In-Memory Evolution in Apache Spark
In-Memory Evolution in Apache SparkIn-Memory Evolution in Apache Spark
In-Memory Evolution in Apache SparkKazuaki Ishizaki
 

Mais procurados (20)

Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
 
QCon London.pdf
QCon London.pdfQCon London.pdf
QCon London.pdf
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
Increasing Kafka Connect Throughput with Catalin Pop with Catalin Pop | Kafka...
Increasing Kafka Connect Throughput with Catalin Pop with Catalin Pop | Kafka...Increasing Kafka Connect Throughput with Catalin Pop with Catalin Pop | Kafka...
Increasing Kafka Connect Throughput with Catalin Pop with Catalin Pop | Kafka...
 
LAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in AndroidLAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in Android
 
Coredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverCoredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS server
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured DataBigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data
 
Using ScyllaDB for Distribution of Game Assets in Unreal Engine
Using ScyllaDB for Distribution of Game Assets in Unreal EngineUsing ScyllaDB for Distribution of Game Assets in Unreal Engine
Using ScyllaDB for Distribution of Game Assets in Unreal Engine
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
 
BitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven rendererBitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven renderer
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorAlmost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Multi tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafkaMulti tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafka
 
ぼくnmonです
ぼくnmonですぼくnmonです
ぼくnmonです
 
In-Memory Evolution in Apache Spark
In-Memory Evolution in Apache SparkIn-Memory Evolution in Apache Spark
In-Memory Evolution in Apache Spark
 
Tuning the g1gc
Tuning the g1gcTuning the g1gc
Tuning the g1gc
 

Destaque

Cdma Dynamic Reverse Link Power Control
Cdma Dynamic Reverse Link Power ControlCdma Dynamic Reverse Link Power Control
Cdma Dynamic Reverse Link Power Controlmscs12
 
Erlang vs. Java
Erlang vs. JavaErlang vs. Java
Erlang vs. JavaArtan Cami
 
GSM capacity planning
GSM capacity planningGSM capacity planning
GSM capacity planningDeepak Joshi
 
Large scale path loss 1
Large scale path loss 1Large scale path loss 1
Large scale path loss 1Vrince Vimal
 
Chap 4 (large scale propagation)
Chap 4 (large scale propagation)Chap 4 (large scale propagation)
Chap 4 (large scale propagation)asadkhan1327
 
An Erlang Game Stack
An Erlang Game StackAn Erlang Game Stack
An Erlang Game StackEonblast
 
cellular concepts in wireless communication
cellular concepts in wireless communicationcellular concepts in wireless communication
cellular concepts in wireless communicationasadkhan1327
 

Destaque (9)

Cdma Dynamic Reverse Link Power Control
Cdma Dynamic Reverse Link Power ControlCdma Dynamic Reverse Link Power Control
Cdma Dynamic Reverse Link Power Control
 
Erlang and Scalability
Erlang and ScalabilityErlang and Scalability
Erlang and Scalability
 
Erlang vs. Java
Erlang vs. JavaErlang vs. Java
Erlang vs. Java
 
GSM capacity planning
GSM capacity planningGSM capacity planning
GSM capacity planning
 
Large scale path loss 1
Large scale path loss 1Large scale path loss 1
Large scale path loss 1
 
Ibs Rajeesh
Ibs RajeeshIbs Rajeesh
Ibs Rajeesh
 
Chap 4 (large scale propagation)
Chap 4 (large scale propagation)Chap 4 (large scale propagation)
Chap 4 (large scale propagation)
 
An Erlang Game Stack
An Erlang Game StackAn Erlang Game Stack
An Erlang Game Stack
 
cellular concepts in wireless communication
cellular concepts in wireless communicationcellular concepts in wireless communication
cellular concepts in wireless communication
 

Semelhante a Benchmarks, performance, scalability, and capacity what's behind the numbers

Monitoring and automation
Monitoring and automationMonitoring and automation
Monitoring and automationRicardo Bánffy
 
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Martin Spier
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Omid Vahdaty
 
Taskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerTaskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerRaghavendra Prabhu
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithNETWAYS
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesEd Hunter
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafkaconfluent
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Omid Vahdaty
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...Red Hat Developers
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokesGagan Bajpai
 
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, CriteoParis Open Source Summit
 
Performance Test Automation With Gatling
Performance Test Automation  With GatlingPerformance Test Automation  With Gatling
Performance Test Automation With GatlingKnoldus Inc.
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
DrupalCon 2014: A Perfect Launch, Every Time
DrupalCon 2014: A Perfect Launch, Every TimeDrupalCon 2014: A Perfect Launch, Every Time
DrupalCon 2014: A Perfect Launch, Every TimePantheon
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningFromDual GmbH
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
 

Semelhante a Benchmarks, performance, scalability, and capacity what's behind the numbers (20)

Monitoring and automation
Monitoring and automationMonitoring and automation
Monitoring and automation
 
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
 
Taskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerTaskerman - a distributed cluster task manager
Taskerman - a distributed cluster task manager
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles Judith
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokes
 
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
 
Gatling
Gatling Gatling
Gatling
 
Performance Test Automation With Gatling
Performance Test Automation  With GatlingPerformance Test Automation  With Gatling
Performance Test Automation With Gatling
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
DrupalCon 2014: A Perfect Launch, Every Time
DrupalCon 2014: A Perfect Launch, Every TimeDrupalCon 2014: A Perfect Launch, Every Time
DrupalCon 2014: A Perfect Launch, Every Time
 
Java vs. C/C++
Java vs. C/C++Java vs. C/C++
Java vs. C/C++
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL Tuning
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 

Mais de Justin Dorfman

Open Source CDNs | LAWebSpeed April 29th 2014
Open Source CDNs | LAWebSpeed April 29th 2014Open Source CDNs | LAWebSpeed April 29th 2014
Open Source CDNs | LAWebSpeed April 29th 2014Justin Dorfman
 
Wisdom of the crowd gathering insights from real user monitoring presentation
Wisdom of the crowd gathering insights from real user monitoring presentationWisdom of the crowd gathering insights from real user monitoring presentation
Wisdom of the crowd gathering insights from real user monitoring presentationJustin Dorfman
 
Solving the hard problems of user experience management presentation
Solving the hard problems of user experience management presentationSolving the hard problems of user experience management presentation
Solving the hard problems of user experience management presentationJustin Dorfman
 
Preview toward agile APM at Intel presentation
Preview toward agile APM at Intel presentationPreview toward agile APM at Intel presentation
Preview toward agile APM at Intel presentationJustin Dorfman
 
Predicting user activity to make the web fast presentation
Predicting user activity to make the web fast presentationPredicting user activity to make the web fast presentation
Predicting user activity to make the web fast presentationJustin Dorfman
 
One millions users vs your web application mega testing cloud applications pr...
One millions users vs your web application mega testing cloud applications pr...One millions users vs your web application mega testing cloud applications pr...
One millions users vs your web application mega testing cloud applications pr...Justin Dorfman
 
Develop, deploy and manage tomorrow’s applications…today presentation 1
Develop, deploy and manage tomorrow’s applications…today presentation 1Develop, deploy and manage tomorrow’s applications…today presentation 1
Develop, deploy and manage tomorrow’s applications…today presentation 1Justin Dorfman
 
Broadening the user perspective – from network latency to user experience tim...
Broadening the user perspective – from network latency to user experience tim...Broadening the user perspective – from network latency to user experience tim...
Broadening the user perspective – from network latency to user experience tim...Justin Dorfman
 
Akamai internet insights
Akamai internet insightsAkamai internet insights
Akamai internet insightsJustin Dorfman
 
A new era at GoDaddy.com presentation
A new era at GoDaddy.com presentationA new era at GoDaddy.com presentation
A new era at GoDaddy.com presentationJustin Dorfman
 
Understanding hardware acceleration on mobile browsers presentation
Understanding hardware acceleration on mobile browsers presentationUnderstanding hardware acceleration on mobile browsers presentation
Understanding hardware acceleration on mobile browsers presentationJustin Dorfman
 
Michelin starred cooking with chef presentation
Michelin starred cooking with chef presentationMichelin starred cooking with chef presentation
Michelin starred cooking with chef presentationJustin Dorfman
 
Abuse prevention in the globally distributed economy presentation
Abuse prevention in the globally distributed economy presentationAbuse prevention in the globally distributed economy presentation
Abuse prevention in the globally distributed economy presentationJustin Dorfman
 
Stability patterns presentation
Stability patterns presentationStability patterns presentation
Stability patterns presentationJustin Dorfman
 
A web perf dashboard up & running in 90 minutes presentation
A web perf dashboard up & running in 90 minutes presentationA web perf dashboard up & running in 90 minutes presentation
A web perf dashboard up & running in 90 minutes presentationJustin Dorfman
 
WordPress Optimization - WordCampLA 09-10-11
WordPress Optimization - WordCampLA 09-10-11WordPress Optimization - WordCampLA 09-10-11
WordPress Optimization - WordCampLA 09-10-11Justin Dorfman
 

Mais de Justin Dorfman (16)

Open Source CDNs | LAWebSpeed April 29th 2014
Open Source CDNs | LAWebSpeed April 29th 2014Open Source CDNs | LAWebSpeed April 29th 2014
Open Source CDNs | LAWebSpeed April 29th 2014
 
Wisdom of the crowd gathering insights from real user monitoring presentation
Wisdom of the crowd gathering insights from real user monitoring presentationWisdom of the crowd gathering insights from real user monitoring presentation
Wisdom of the crowd gathering insights from real user monitoring presentation
 
Solving the hard problems of user experience management presentation
Solving the hard problems of user experience management presentationSolving the hard problems of user experience management presentation
Solving the hard problems of user experience management presentation
 
Preview toward agile APM at Intel presentation
Preview toward agile APM at Intel presentationPreview toward agile APM at Intel presentation
Preview toward agile APM at Intel presentation
 
Predicting user activity to make the web fast presentation
Predicting user activity to make the web fast presentationPredicting user activity to make the web fast presentation
Predicting user activity to make the web fast presentation
 
One millions users vs your web application mega testing cloud applications pr...
One millions users vs your web application mega testing cloud applications pr...One millions users vs your web application mega testing cloud applications pr...
One millions users vs your web application mega testing cloud applications pr...
 
Develop, deploy and manage tomorrow’s applications…today presentation 1
Develop, deploy and manage tomorrow’s applications…today presentation 1Develop, deploy and manage tomorrow’s applications…today presentation 1
Develop, deploy and manage tomorrow’s applications…today presentation 1
 
Broadening the user perspective – from network latency to user experience tim...
Broadening the user perspective – from network latency to user experience tim...Broadening the user perspective – from network latency to user experience tim...
Broadening the user perspective – from network latency to user experience tim...
 
Akamai internet insights
Akamai internet insightsAkamai internet insights
Akamai internet insights
 
A new era at GoDaddy.com presentation
A new era at GoDaddy.com presentationA new era at GoDaddy.com presentation
A new era at GoDaddy.com presentation
 
Understanding hardware acceleration on mobile browsers presentation
Understanding hardware acceleration on mobile browsers presentationUnderstanding hardware acceleration on mobile browsers presentation
Understanding hardware acceleration on mobile browsers presentation
 
Michelin starred cooking with chef presentation
Michelin starred cooking with chef presentationMichelin starred cooking with chef presentation
Michelin starred cooking with chef presentation
 
Abuse prevention in the globally distributed economy presentation
Abuse prevention in the globally distributed economy presentationAbuse prevention in the globally distributed economy presentation
Abuse prevention in the globally distributed economy presentation
 
Stability patterns presentation
Stability patterns presentationStability patterns presentation
Stability patterns presentation
 
A web perf dashboard up & running in 90 minutes presentation
A web perf dashboard up & running in 90 minutes presentationA web perf dashboard up & running in 90 minutes presentation
A web perf dashboard up & running in 90 minutes presentation
 
WordPress Optimization - WordCampLA 09-10-11
WordPress Optimization - WordCampLA 09-10-11WordPress Optimization - WordCampLA 09-10-11
WordPress Optimization - WordCampLA 09-10-11
 

Último

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Benchmarks, performance, scalability, and capacity what's behind the numbers

  • 1. Beyond The Numbers Baron Schwartz
  • 2. Who Am I? ● baron@percona.com ● @xaprb ● linkedin.com/in/xaprb ● xaprb.com/blog
  • 3. Who Am I? ● Maatkit ● Percona Toolkit ● Innotop ● Monitoring Plugins ● Aspersa ● Online Tools ● JavaScript Libraries
  • 4. Consulting ● Percona Server ● Support ● Percona XtraBackup ● Remote DBA ● Percona XtraDB Cluster ● Engineering ● Percona Toolkit ● Conferences & Training ● Many More
  • 5. Today's Agenda ● Benchmarks ● Aggregation and Distributions ● Performance, Capacity & Utilization ● Rules of Thumb ● Queueing Theory and Scalability
  • 7. What's Missing? ● Distribution ● Time Series ● Response Times ● Parameters ● Goals ● System Specs
  • 8. What's Misleading? ● Logarithmic X-Axis ● Interpolation
  • 9. What's Good? ● Y-Axis Reaches 0 ● No Fake-Smoothing
  • 11. Look At All That Data...
  • 12. What's With The Grid Lines?!?!?
  • 13. Better Benchmarks What does an ideal benchmark report look like?
  • 14. Clear Benchmark Goals ● Validating hardware configuration ● Comparing two systems ● Checking for regressions ● Capacity planning ● Reproducing bad behavior to solve it ● Stress-testing to find bottlenecks
  • 15. Hardware and Software ● Specs for CPU, disk, memory, network ● Software versions (OS, SUT, benchmark) ● Filesystem, RAID controller ● Disk queue scheduler
  • 16. Presenting Results ● Ideally, make raw results available ● Include metrics from OS (CPU, RAM, IO, network) ● Generate some plots to summarize ● This is where the rubber meets the road!
  • 17. Better Aggregate Measures ● Average ● Percentiles ● 95th ● 99th ● Maximum ● Observation Duration ● Question: how bad can 95th percentile be?
  • 18. More Aggregate Measures ● Median (50th Percentile) ● Standard Deviation ● Index of Dispersion
  • 23. Performance ● What is Performance? ● Two Metrics ● Response Time (time per task) ● Throughput (tasks per time) ● They're not reciprocals ● More on this later
  • 24. What Performance Isn't ● CPU Usage ● Load Average ● Other metrics of resource consumption
  • 25. Performance ● I often focus on response time ● It represents user experience ● Throughput indicates capacity rather than performance ● For benchmarking, throughput is primary
  • 26. Utilization ● The portion of time during which the resource is busy ● i.e. there is at least one thing in progress
  • 27. Utilization is Confusing ● Be very careful with tools that report utilization ● From the Linux iostat man page: ● “%util: Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.” ● Can you parse that? Is it true?
  • 28. Capacity ● What is Capacity?
  • 30. Capacity – My Definition Capacity is the maximum throughput ... at achievable concurrency ... with acceptable performance ... as defined by response time ... meeting specified constraints ... over specified observation intervals.
  • 31. Capacity Example ● What is capacity of the system at a concurrency of 32 with 10-second 95th- percentile response time not to exceed 2ms over a 60-minute duration? ● To determine this, we need goal-seeking benchmark software ● Most benchmark software can't do this
  • 32. Benchmarks, etc Recap ● Most benchmarks reveal very little ● Benchmark reports reveal even less ● It's good to go beyond the surface
  • 33. Amdahl's Law ● “The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program.” - Wikipedia ● It's basically a law of diminishing returns.
  • 34. Should I Defragment My Disk? ● Method 1: Google “defragment” ● Method 2: Try it and see ● Method 3: Measure if the disk is a bottleneck
  • 37. Amdahl's Law ● Don't try to optimize little things.
  • 38. Little's Law ● N = XR ● That is, ● Concurrency = Throughput * Response Time ● This holds regardless of queueing, arrival rate distribution, response time distribution, etc.
  • 39. Little's Law Example ● If disk IOs average 4ms... ● And there are 280 IOs per second... ● Then the disk's average concurrency is: ● N = 280 * .004 ● N = 1.12 ● Do you believe this? ● When might it not be true?
  • 40. Little's Law Example #2 ● If disk utilization is 98% ● And there are 280 IOs per second ● What do we know?
  • 41. Utilization Law ● U = SX ● Also independent of distributions, etc... ● That is, ● Utilization = Service Time * Throughput ● Utilization = 98% and Throughput = 280 ● S = U/X ● Service Time = .98 / 280 = .0035
  • 42. Queueing Theory ● How can we predict the amount of queueing in a system? ● How can we predict its response times? ● How can we predict capacity?
  • 43. Erlang Queueing ● Erlang's formulas model the probability of queueing for a given arrival rate, service time, and number of servers. ● A “server” is anything capable of serving a request. ● CPUs ● Disks
  • 44. CPU -vs- Disk Queueing ● Scenario: 4-CPU, 4-disk (RAID0) server ● Thought experiment: ● How do processes queue for CPU? ● How do I/O requests queue on disks?
  • 45. Notation ● Typically see something like M/M/1 ● Each letter is a placeholder in A/S/n ● A = Arrival distribution ● S = Service-time distribution ● n = Number of servers ● A and S can be one of: ● Markov ● Deterministic ● General
  • 46. CPUs -vs- Disks ● CPUs: M/M/4 ● Disks: 4 x {M/M/1}
  • 47. M/M/1 Queueing cmg.org
  • 48. M/M/n Queueing cmg.org
  • 49. Erlang C Function ● M/M/n queueing is modeled by Erlang C ● See http://en.wikipedia.org/wiki/Erlang_(unit)
  • 50. What's Wrong With Erlang C? ● You must validate your arrival times. ● You must validate your service times. ● The equation is hard to work with. ● In practice, it's hard to use Erlang C.
  • 51. Scalability ● Queueing causes non-linear scaling. ● But first, let's talk about linearity.
  • 52. System Scalability Throughput Why? Concurrency
  • 53. Universal Scalability Law Linear Amdahl Throughput USL Concurrency
  • 58. Scalability Limitations ● Locks ● Synchronization points ● Shared resources ● Duplicated data to be kept in sync ● Weakest-link problems
  • 59. RAID10 On EBS ● Which is faster? ● RAID 10 over 10 EBS volumes ● RAID 10 over 20 EBS volumes ● Hint: http://goo.gl/Xm92Y ● Also, http://goo.gl/fAEIL
  • 60. Debunking “Linear” ● Ask to see the actual numbers. ● They shouldn't be rounded off suspiciously. ● They must be truly linear. ● They must intersect the point (0, 0).
  • 64. Resources ● Naomi Robbins' Blog ● http://blogs.forbes.com/naomirobbins/ ● Percona White Papers ● http://www.percona.com/ ● Neil J. Gunther ● Guerrilla Capacity Planning ● http://www.contextneeded.com/