How to Analyse Software System Scalability

Scalability: What It Is and How to Analyse It

Escalabilidade: O Que É e Como Analisá-la

Prof. David S. Rosenblum
University College London
United Kingdom

http://www.cs.ucl.ac.uk/staff/D.Rosenblum/

Acknowledgments

• Letícia Duboc

• Tony Wicks

• Alex Wolf

• Emmanuel Letier

SBES 2007 2

The Importance of Scalability

• Gartner predictions for 2008
– Moore’s Law continues to hold
• Desktop PC: 4–8 CPUs @ 40GHz, 4–12GB RAM, 1.5TB
storage, 100Gb network
– Desktop PCs < 50% of end-user devices
• Microsoft Research ‘Towards 2020 Science’
– The limits of Moore’s Law will soon be reached
– Bandwidth is not keeping pace with storage capacity

It is becoming ever more important for
It is becoming ever more important for
software systems to scale well!
software systems to scale well!
SBES 2007 4

But What Is Scalability?

‘I examined aspects of scalability, but did not find a
useful, rigorous definition of it. Without such a
definition, I assert that calling a system “scalable”
is about as useful as calling it “modern”. I
encourage the technical community to either
rigorously define scalability or stop using it to
describe systems.’

Mark D. Hill, ‘What is Scalability?’, ACM SIGARCH
Computer Architecture News, vol. 18, no. 4,
December 1990, pp. 18-21.
SBES 2007 5

You Know It When You See It?

• Many uses of the term in technical literature
– Design documents
– Research papers
– Standards specifications
– Product brochures
• But very few precise definitions of the term
– Notable exception: Parallel Computing

SBES 2007 6

Example
Documentum

‘Scalability is a key requirement for the corporate
content infrastructure, … [which] needs to be
capable of handling high volumes of content as
well as of fulfilling high performance
requirements.’

SBES 2007 7

Example
Sun Microsystems

‘The Java 2 Platform, Micro Edition (J2ME)
technology from Sun Microsystems, Inc. is used
by developers to scale Java technology-based
applications into small consumer and embedded
devices.’

SBES 2007 8

Example
SAP Specification

Mark Handley, Colin Perkins and Edmund Whelan, Session
Announcement Protocol, RFC 2974, October 2000.

• 5500 Words, Including 3 Occurrences of ‘Scalability’:
– Abstract: ‘This document describes version 2 of the multicast session
directory announcement protocol, Session Announced Protocol (SAP), and
the related issues affecting security and scalability that should be taken
into account by implementors.’
– Section on Terminology: ‘A SAP announcer periodically multicasts an
announcement packet to a well known multicast address and port. The
announcement is multicast with the same scope as the session it is
announcing, ensuring that the recipients of the announcement are within
the scope of the session the announcement describes (bandwidth and
other such constraints permitting). This is also important for the scalability
of the protocol, as it keeps local session announcements local.’
– Section Heading: ‘Scalability and Caching’

SBES 2007 9

For the Record

• I’m just as guilty …

Antonio Carzaniga, David S. Rosenblum and Alexander L.
Wolf, ‘Achieving Scalability and Expressiveness in an
Internet-Scale Event Notification Service’, Proc. Nineteenth
ACM Symposium on Principles of Distributed Computing
(PODC 2000), Jul. 2000, pp. 219–227.

• No precise, explicit definition of scalability
• Scalability by implication
– Scalability of publish/subscribe infrastructure is related to choice of
subscription language
– Certain language features are not scalable
– Thus, languages without those features are scalable!?!
SBES 2007 10

Some Typical Notions of Scalability
(1)

• High Performance
– Computations/messages/transactions per second
– Fixed-size and fixed-time parallel speedup
• Computational Complexity
– Time and space complexity of algorithms
• Polynomial is scalable
• Exponential isn’t
• Abstraction
– Programmer productivity as a function of the expressive
power of programming languages

SBES 2007 11

(2)

• Software Tools
– Testing considered more scalable than verification

– State space explosion in model checking
• Competing techniques compared in terms of state space size
• Symmetries exploited to improve scalability

– Effect of analysis precision on scalability
• Is a bug-finding tool that scales to millions of lines of code
‘scalable’ if it identifies thousands of potential bugs?
– What about the ‘scalability’ of the developer effort in
analysing those bug reports?

SBES 2007 12

(3)

Charles B. Weinstock and John B. Goodenough, On System
Scalability, technical note CMU/SEI-2006-TN-012,
Software Engineering Institute, March 2006.

• Two main uses of the term scalability:
– The ability to handle increased workload without adding resources
to a system
– The ability to handle increased workload by repeatedly applying a
cost-effective strategy for extending a system’s capacity

• To which we might add …
– The ability to handle existing workload better by extending a
system’s capacity
SBES 2007 13

Some Definitions

‘Scalability: the ease with which a system or component can be
modified to fit the problem area.’
[Software Engineering Institute]
– What do ‘ease’ and ‘fit’ mean?

‘Scalability means not just the ability to operate, but to operate efficiently
and with adequate quality of service, over the given range of
configurations.’
[Jogalekar and Woodside]
– What do ‘efficiently’ and ‘adequate’ mean?

‘An architecture is scalable … if it has a … linear (or sub-linear) increase
in physical resource usage as capacity increases.’
[Brataas and Hughes]
– Why linear?
– What about quicksort, with O(n log n) average case and O(n2) worst case?

SBES 2007 14

A Framework for
Characterising and Analysing Scalability

An Attempt to Unify These Ideas

Leticia Duboc, David S. Rosenblum and Tony Wicks, A
Framework for Characterization and Analysis of Software
System Scalability, Proc. ESEC/FSE 2007, Dubrovnik,
Croatia, September 2007.

• Scalability is a quality of software systems …
– characterized by the operational impact …
– that characteristics of the execution environment and design have
…
– on measured system qualities …
– as the characteristics are varied …
– over expected ranges and/or alternatives

• If a software system can accommodate this variation in a
way that is acceptable to stakeholders, then it is a scalable
system.
SBES 2007 16

Example
Google Search Engine

• Most people would agree that Google is scalable
– Dramatic growth in the size of the Web
– Dramatic growth in the rate of queries to Google
– Yet a virtually constant response time for users

• Naturally parallelisable problem
– Implemented as a cluster of commodity PCs
– Cluster increased as Web and query load increase
– Note that this is an instance of the second of Weinstock
and Goodenough’s two uses of the term scalability

SBES 2007 17

The Scalability Framework
As Exemplified by Google

Scalability Goal/Question
design characteristics
identify and bound
environment and

scaling non-scaling
design environment

size of network
system execution

Web latency
response
queries per available system time
second bandwidth
Can Google provide constant response
time as the number of
behaviour
I/O usage
govern determine
queriescluster second of the number of Web pages increase
per choice and price per
size algorithms
over time? performance

system
and bound qualities
identify

Scalability Answer/Claim
SBES 2007 18

In Terms of Experimental Design

identify identify and bound
and bound

scaling non-scaling
distinct
design environment

behaviours
system execution

system
behaviour dependent
independent
variables govern determine variables
implementation
or
model

nuisance variables measure
factors manipulate
raw data
over ranges

Scalability Analysis

SBES 2007 19

In Terms of Microeconomics

and bound

scaling non-scaling
distinct
design environment

behaviours
system execution

system
behaviour dependent
independent
implementation
or
model


preference functions
factors manipulate
raw data
over range


utility function
SBES 2007 20

As Exemplified by Google Once More

and bound

scaling non-scaling
Google is scalable with respect tobehaviours
response time
distinct
design environment
system execution

system
because it maintains a
independent constant response
time as the
behaviour
dependent
number of queriesorper second
model

implementation
and the number of Web pages scale over time,
factors manipulate
raw data
over range
by increasing the number of machines in the cluster
preference functions

utility function
SBES 2007 21


• Scalability as a multi-criteria optimization problem
• Scalability as a matter of stakeholders’ interest

preference
function
preference
quantified combined utility
scalability as function into function
goals
preference
function

quantify satisfaction quantify
with goals for overall
individual satisfaction
system qualities with system

SBES 2007 22

Preferences, Utilities and
Pareto Optimality

• Preference for quality j
pj : X j → R

• Normalized preference
∧
pj (Xj) − pj,min
pj (Xj) =
pj,max − pj,min
• Utility function
∧ ∧
U(X1,…,Xk) = λ1p1(X1) + … + λkpk(Xk)

• Preferences often compete, in the sense that improving
one often degrades another
– Example: tradeoff between throughput and message buffer size
– This is called Pareto Optimality

SBES 2007 23

Case Study
Fortent Data Analysis System

• Intelligent Enterprise Framework (IEF)
– Overnight analysis of transactional data to identify
unusual and possibly fraudulent patterns of bank and
credit card transactions
– Java - 1,556 classes - 326,293 lines of code

• Surrogate Key Server (SKS) Component
BE BE BE SK SK SK
BE BE BE replace business SK SK SK
entity identifiers
BE BE BE SK SK SK
BE BE BE BE SK SK SK SK

BE SK
batches of BE SK injected
transactions on surrogate keys
business entities entity-key
mapping
SBES 2007 25

Case Study
SKS Implementation Details

• Early implementation (year 2000)
– In-memory cache
– High storage overhead, eventually crashing system

• Replaced with
– In-memory cache for low-volume business entities
– File-based cache for high-volume business entities

• Scalability concern: support a growing number of
business entities in overnight batches, while
maintaining throughput and memory usage within
acceptable levels
SBES 2007 26

Case Study
Framework Instantiation

• Independent Variables • Dependent Variables
– Scaling – Average throughput
• Number of distinct business • minimum 100 transactions/sec
entities – Memory usage
– 36.6 million average • up to 500 MB
– 50 million maximum
– Disk usage
• Number of threads
• up to 24 GB
– 1 to 5
– Non-scaling
• Memory-based design
• Utility Function
versus – Throughput and memory
file-based design usage 10 times as important
• JVM memory size as disk usage
– up to 500 MB

SBES 2007 27

Case Study
Preferences and Utility

• Throughput preference
∧ -1, if x < 100
t(x) =
x – 100 , otherwise
400 – 100

• Heap usage preference
∧ -1, if y > 500
h(y) = • System utility
∧ ∧ ∧
500 – y , otherwise U(x,y,z) = 10 t(x) + 10 h(y) + d(z)
500 – 0 21

• Disk usage preference
∧ -1, if z > 24
d(z) =
24 – z , otherwise
24 – 0

SBES 2007 28

Case Study
Results

SBES 2007 29

Case Study
Evaluation

• Advantages
– Demonstrates applicability of framework
– Could have saved time, effort and money for IEF
• Shortcoming: This was a retrospective analysis
– The system was already implemented
– The requirements were well-known
– The metrics could be easily and reliably collected
– But these results confirm those from an earlier study of
simple prototypes developed in two weeks by Letícia
• Also difficult to assess analysis overhead
SBES 2007 30

Summary

• Scalability is an important software quality
• But it is a quality that has been poorly
characterised, defined, analysed and understood
• And it’s not just about performance!
• A proper characterisation of a system’s scalability
should be qualified with reference to relevant
independent and dependent variables
– It’s meaningless just to say ‘System X is scalable’ or
‘System Y is not scalable’
– Must say ‘System X is scalable with respect to
throughput as the number of users varies over [i,j]’
SBES 2007 32

Next Steps
Scalability Requirements

Where Do the
Preference and Utility Functions Come From?

• Stakeholders …
– Are able to identify important scalability variables
– Like to think in terms of simple bounds on the variables
• Rather than the underlying functions that relate them
– Are usually poor at estimating those bounds
• Typically underestimate system load and system lifetime
• Currently exploring Goal-Oriented Requirements
Engineering for Scalability Requirements
SBES 2007 33

Next Steps
Additional Issues

• More Case Studies and Interviews
– Towards a broader understanding of scalability pitfalls
– Letícia is doing many of these!
• Modelling Cost and Benefit in the Framework
• Prediction and Extrapolation
– From models
• Will there be enough information in formal models?
– From small prototypes
• Compression and expansion of test data subdomains

SBES 2007 34

Obrigado!

Questions?
Perguntas?

http://www.cs.ucl.ac.uk/staff/D.Rosenblum/

How to Analyse Software System Scalability

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (12)

Destaque

Destaque (20)

Semelhante a How to Analyse Software System Scalability

Semelhante a How to Analyse Software System Scalability (20)

Último

Último (20)

How to Analyse Software System Scalability

Notas do Editor