2. Scalability vs. Performance
• Ratio of the increase in • Serving a single request in the
throughput to an increase in shortest amount of time
resources • Inverse of latency
• Support additional users at the
least incremental cost
• Predictability of application
behavior as users are added
3. Scalable Performance
• Number of requests that can be concurrently served
(throughput) while meeting a minimum level of service
(response time)
• Measuring:
o Resource utilization
o Throughput
o Response time
4. Horizontal vs. Vertical Scaling
• Increase the hardware resources • Improve hardware capabilities
• Separate types of processing into (cpu, RAM, storage, etc…)
tiers • No network bottleneck
• Commodity hardware is a • Becomes increasingly expensive
predictable cost per user • Practical and financial limitations
• Limitations to scaling are dictated to the ability to scale
by application architecture • Typically increases performance
• Can degrade performance
6. General Observations
• Performance decreases in each later tier (LB > web > app >
DB) due to increasing complexity
o Service requests in the earliest possible tier
• Costs of scalability increases in each later tier (LB < web <
app < DB)
o Architect bottlenecks in the earliest possible tier
• Scalable performance is ultimately limited by the operations
that do not scale linearly
• Ideally, each request that makes it to the database tier
would have its own connection
o Realistically, this means serving requests in earlier tiers because of
constraints to db scaling
10. Resource Capacity Settings
• Database CPUs
• Database connection pool
• Application server CPUs
• Application server thread pool
• JVM Heap settings
• Web server thread pool
11. Resource Capacity Settings
• Walk through the application architecture and identify the
points where a request could potentially wait.
• Open all wait points.
• Generate balanced and representative load against the
environment.
• Identify the limiting wait point’s saturation point.
• Tighten all wait points to facilitate only the maximum load of
the limiting wait point.
• Force all pending requests to wait at the Web server.
• Add more resources.
12. Profiling
• Long running http requests
• Long running methods
• Memory leaks
• Deadlocks
• Long running queries
13. Database Tier
• System of record
• Difficult to scale horizontally and expensive to scale
vertically
• Keep connections limited to what the server will support
o Block at the app tier
• Perform data processing on staging server
• Each database has its own optimization techniques
o Explain plan to locate and eliminate full table scans
o Query and table caches
o Buffer sizes
14. Application Tier
• Generally cannot be stateless due to security and business
requirements
• Sticky vs. clustered sessions
o Use the HTTPSession sparingly
o If the app does not need to be HA, it may not require clean failover
and can drop sessions
• Scaling horizontally could potentially put more load on the
DB
o Caching can be used to offset the load
15. Application Caching
• Read-only data can be cached like static data in the web
tier
• Write-able data can be cached but will impose limitations on
clustering
o Will probably either need to be in-sync across the cluster or turned off
completely
• Filters can provide caching of dynamic, secure data at the
earliest point in this tier
o Caches entire response
o Not recommended for user-specific data
• Service layer or data access caches provide a simple way
to stop requests from continuing to the database
o Transparent to the calling code
o Cache interceptors
16. Tomcat Tuning
• maxThreads controls actively served connections
• backlog controls the number of connections that can be
queued
• maxThreads + backlog = total accepted connections
• connectionTimeout can be used to drop faulty connections
• bufferSize is by default set to -1, no buffering of output
• Keep heap size manageable, < 2GB
17. Web Tier
• Clustering is easiest in the web tier because they are
generally stateless
• Web tier clustering can provide super-linear scaling (IO
contention, context switching costs)
• The web tier should serve all static data (images, static
html)
• The web tier can serve dynamic requests by caching non-
secure data that only depends upon url parameters
o Squid reverse proxy
o Apache mod cache
o Memcached
18. Apache Tuning
• Limit connections in the web tier to prevent overloading
later tiers (MaxClients)
• ServerLimit x Memory per process < RAM available to limit
swapping
• Avg connections = ThreadsPerChild x Apache hosts / App
Server hosts
• ProxyPass max = ThreadsPerChild
19. Other
• Grid Caches
o GigaSpaces
o Coherence
• CDN
o Akamai
• Compute Appliances
o Azul Systems