The Economies of Scaling Software

The Economies of Scaling
Software
Abdelmonaim RemaniAbdelmonaim Remani
@PolymathicCoder@PolymathicCoder

About Me
• Platform Architect at just.me Inc.
• JavaOne RockStar and frequent speaker at many developer events and conferences
including JavaOne, JAX, OSCON, OREDEV, 33rd Degree, etc...
• Open-source advocate and contributor
• Active Community member
• The NorCal Java User Group
• The Silicon valley Spring User Group
• The SiliconValley dart Meetup
• Bio: http://about.me/PolymathicCoder
• Twitter: @PolymathicCoder
• Email: abdelmonaim.remani@gmail.com
• SlideShare: http://www.slideshare.net/PolymathicCoder/

License
• Creative Commons Attribution Non-Commercial
License 3.0 Unported
• The graphics and logos in this presentation belong to
their rightful owners

• http://speakerscore.com/jaxconf-scalability
• @PolymathicCoder

What’s up with the title?
• The Economies of Scale
• “In microeconomics, economies of scale
are the cost advantages that enterprises
obtain due to size [...] Often operational
efficiency is [...] greater with increasing
scale [...]” -Wikipedia

The line is blurred!
• The was a time when only the enterprise worried about issues
like scalability
• The rise of social and the abundance of mobile are responsible
for
• Not only an exponential growth of internet traffic
• But the creation of a spoiled user-base that wants answers
to questions like
• I want to see the closest Moroccan restaurants to my
current location on a map along with consumer
ratings and whether any of my friends has recently
checked-in in the last 30 days

The bar is high!
• Scalability is everyone’s problem

The Common Definition
• The ability of a software to handle an
increasing amount of work without
performance degradation

I have a problem with that definition...
• It implies that a scalable system is one that is
capable of sustaining its scalability forever
• Not realistic, It fails to recognize external
constraints imposed
• It fails to acknowledge that scalability is relative
• It does not take into account that a system
• Need not to be capable to handle the work
• But simply capable of evolving to handle the work

A better definition
• The ability of an application to gracefully
evolve within the constraints of its
ecosystem in order to handle the
maximum potential amount of work
without performance degradation

Easier said than done!
• A black art
• Not surprise here!
• An application that supports 1 million
users
• You add one new feature
• 500,000 users crash your system

The BottlenecksThe Bottlenecks

The Bottlenecks
• Scaling is about relieving or managing these limitations or
constraints that we call the bottlenecks
• When we talk about bottlenecks in computing, we talk about the
usual suspects
• The CPU
• Storage or I/O
• The Network
• Inter-related
• The rest of this talk is structured around these bottlenecks to make
the case that one’s scalability needs are to be addressed in that
fashion

The CPUThe CPU
BottleneckBottleneck

The CPU Bottleneck
• Nothing affects the CPU more than the
instructions it is summoned to execute
• In other words, this is about the very code
of your application

Architecture?
• Architecture
• “Things that people perceive as hard-to-
change” - Martin Fowler
• http://martinfowler.com/ieeeSoftware/whoNee
• Decisions you commit to; the ones that will
be stuck with you forever

Be wise...Think twice...
Choosing the right technologies
• Platform
• Languages
• Frameworks
• Libraries
Making the right abstractions
• Technical Abstractions
• Functional Abstractions
• Make sure that the former is subordinate to the later and not the other way
around

Write Good Code
• Think your algorithms through and mind their complexity (Asymptotic Complexity,
Cyclomatic Complexity, etc...)
• SOLIDify your Design
• Single Responsibility, Open-Closed, Liskov Substitution, Interface Segregation, and
Dependency Inversion
• Understand the limitation of your technology and leverage its strengths
• Don’t be afraid to be Polyglot
• Obsess with testing
• TTD/BDD
• Tools
• Static code analyzers (PMD, FindBugs, Etc...)
• Profilers (Detect memory leaks, bottle neck, etc...)
• Etc...

KnowYour S#!t
• Read
• The classics: The Mythical Man-Mouth
• GoF’s “Design Patterns”
• Eric Evans’ “Domain-Driven Design”
• Every book by Martin Fowler
• Uncle Bob’s “Clean Code”
• Josh Bloch’s “Effective Java”
• Brian Goetz’s “Java Concurrency in Practice ”
• Etc...
• The list is long ...

We do all that... and still end up with this...
• The fading tradition of making cow dung piles

Still much better than this...

Technical Debt is a Reality
• It is the inevitable...You will incur it one way or another deliberately and not
• The quick-and-dirty you are not proud of
• Things you would/should do differently
• Anyways, after a while it starts to smell...
• The bright side
• The fact that it is recognize as a debt is good
• Keep track and refactor
• For the fearless... Be wise and think twice before you do it
• Cut the right corners
• Don’t lock yourself out
• Don’t make it a part of your architecture

Parallelism
• Parallelism?
• Writing concurrent code or simultaneously executing code
• Most write code that runs within web containers by extending
framework classes that are already multi-threaded
• Sometimes the complexity of the business logic demands that we
break it into smaller steps, execute them in parallel, then
aggregate data back to get a result within a reasonable amount of
time
• This is not easy!
• Often requires synchronizing state, which is a nightmare

Vertical Scaling
• Vertical Scaling (Scaling up)
• A single-node system
• Adding more computing resources to the
node (Get a beefier machine)
• Writing code to harness the full power of
the one node

Easier said than done...
• On the one machine, we have been reaping the benefit of Moore’s Law
• Performance gain is automatically realized by software (In other
words, code is faster on faster hardware)
• The End of Moore’s Law:The birth multi-core chip
• We actually need to write code to take advantage of this
• Good news! There are frameworks and libraries make it a lot easier
• Fork/Join in Java
• Akka
• Etc...

Easier said than done...
• Challenges
• What about dependencies and 3rd Party code?
• Synchronizing state just got HARDER across cores! Too
many cooks!
• Frankly, this shared state deal is a real pain
• Get a life and do without
• Go immutable (Not always straightforward or
not even sometimes not possible)
• Go “Functional” (No guts... no glory...)

It gets more interesting...• Amdahl’s Law
• Throwing more cores does not necessarily result in
performance gain
• We actually end up with diminishing return at some point no
matter how many cores you throw in

Horizontal Scaling
• Horizontal Scaling (Scaling out)
• A distributed system (A cluster)
• Adding more nodes
• Writing code to harness the full power of
the cluster

Topology
• A typical cluster consists of
• A number of identical application server nodes behind a load balancer
A number?
• It depends on how many you actually need and can afford
• Elastic Scaling / Auto-Scaling
• The number of live nodes within the cluster shrinks and grows depending on the load
• New ones are provisioned or terminated as needed
Identical?
• Application nodes are cloned off of image files (Ex. AWS Ec2 AMIs, etc...)
• Configuration Management tool (Chef, Puppet, Salt, etc...)
Load balancer?
• Load is evenly distributed across live nodes according to some algorithm (Round-Robin typically)

Managing State
• Session data
• Session Replication
• Session Affinity/Sticky Session
• Requests from the same client always get routed back to the
same server
• When the node dies, session data die with it
• Shared/Distributed Session
• Session is in a centralized location
• Do your self a favor and go stateless!
• No session data
• Any server would do

Parallelism
• Leverage MapReduce
• “A programming model for processing
large data sets with a parallel, distributed
algorithm on a cluster”
• Hadoop

Misc
• Distributed Lock Manager (DLM)
• Synchronized access to shared resources
• Google Chubby
• Zookeeper
• Hazelcast
• Teracotta
• Etc...
• Distributed Transactions
• X/Open XA
• HTTPS
• End at the load balancer
• Wildcard SSL
• Leverage probabilistic data structures and algorithms
• Bloom filters
• Quotient filters
• Etc...

Deployment
• Environments
• Multiple Development,Test, Stage, and
Production
• Automatic Configuration Management
• Practice Continuos Delivery
• Leverage The Cloud
• IaaS, PaaS, SaaS, and NaaS

The StorageThe Storage

The Storage Bottleneck
• Storage or I/O is usually the most
signification

What datastore to use?
What kind of question is that?
What kind of question is that?
• There was a time when the obvious choice was the relational model
• Schema that guarantees data integrity
• Data Normalized (minimized redundancies, no modification anomalies, etc...)
• ACIDity (Atomicity, Consistency, Isolation, and Durability)
• Data is stored in away that is independent from how the data is to accessed (No biased
towards any particular query patterns)
• Flexible query language
• As our datasets grow, we scaled vertically
• Buying beefier machines
• Database tuning / Query Optimization
• Creating MaterializedViews
• De-normalizing
• Etc...

Mucho Data!
• We hit the limit of the one machine
• Attempted to scale the RDBMS horizontally
• Master/Slave clusters
• Data Sharding
• We failed...Why?
• Eric Brewer’s CAP Theorem on distributed systems
• Pick 2 out of 3
• Consistency
• Availability
• Partition Tolerance
• The relational model is designed to favor CA over P
• It cannot be scaled horizontally

NoSQL
• A wide range of specialized data stores with the goal of
addressing the challenges of the relation model
• “The whole point of seeking alternatives is that you need to
solve a problem that relational databases are a bad fit for” -Eric
Evans
• A wide variety
• Key-Value Data stores
• Columnar Data stores
• Document Data stores
• Graph Data stores

Polyglot Persistence
• Acknowledging
• The complexity and variety data and data access
patterns within the one application
• The absurdity of the idea that all data should be
fitted into one storage model
• Proposing a solutions that
• Leverage multiple data stores within the one
application based on the specific way the data is
stored and accessed

For more details...
• Checkout my talk from JAX Conf 2012
• The Rise of NoSQL and Polyglot
Persistence
• YouTubeVideo:
• http://bit.ly/PCWtWi

Caching
• A cache is typically simple key-value data structure
• Instead of incurring the overhead of data retrieval or
computation every time, you check the cache first
• Since we can’t cache everything, caches can be configured to
use multiple algorithms depending on the use cases (LRU,
Bélády's Algorithm, Etc...)
• Use aggressively!
• What to cache?
• Frequently accessed data (Session data, feeds, etc...)
• Long computation results

Caching
• Where to cache?
• On disk
• File System: Slow and sequential access
• DB:A little bit better (Data is arranged in structures
designed for efficient access, indexes, etc...)
• Generally a terrible idea
• SSD make things a little better
• In-Memory: Fast and random access, but volatile
• Something in between: Persistent caches (Redis, etc...)

Caching
• Types of Caches
• Local
• Replicated
• Distributed
• Clustered

Caching
• How to cache?
• Most caches implement a very simple interface
• Always attempt to get from cache first using a key
• If it is a hit, you saved yourself the overhead
• If it is a miss, compute or read from the data
store then put in cache for subsequent gets
• When you update you can evict stale data
• You can set a TTL when you put
• Many other common operations...

Caching Patterns
• Caching Query Results
• Key: hash of the query itself
• How about parametrized complex queries?
• Key: hash of the query itself + hash of parameter values
• Method/Function Memoization
• Key: method name
• How about with parametrized?
• Key: hash of the method name + hash of parameter values
• Caching Objects
• Key: Identity of the object

Caching Pattern
• Time-series datasets (Ex. Realtime feed)
• Sometimes pseudo/near realtime is
enough
• Use caching to throttle access to the
source
• Cache query result with a t expiry
• Fresh data is only read every t

Caching Gotchas
• Profile your code to assess what to cache, and whether you
need to to begin with
• Stale state might bite you hard
• Incoherence: Inconsistent copies of objects cached with
multiple keys
• Stale nested aggregates
• Network overhead of misses might outweighs the
performance gain of the hits
• Consider writing/updating to cache when you write to the
data store

Featured Solutions
• EhCache
• Memcahed
• Oracle Coherence
• Redis
• A Persistent NoSQL store
• Supports built-in data structures like sets and lists
• Supports intelligent keys and namespaces

The NetworkThe Network

Asynchronous Processing
• Resource-intensive tasks are not practical
to handle a during a HTTP request window
• Synchronous is overused and not necessary
most of time

Asynchronous Processing Patterns
• Pseudo-Asynchronous Processing
• Flow
• Preprocessing data / operations in advance
• Request data or operation
• Responding synchronously with preprocessed
result
• Sometimes not possible (Dynamic content,
etc...)

Asynchronous Processing Patterns
• True Asynchronous Processing
• Flow
• Request data or operation
• Acknowledge
• Ex.A REST that return an “202 Accepted” HTTP
status code
• Do Processing at your own connivence
• Allow the user to check progress
• Optionally notify when processing is complete

Techniques
• Job/Work/Task Queues
• JMS
• AMQP (RabbitMQ,ActiveMQ, Etc...)
• AWS SQS
• Redis Lists
• Etc...
• Task Scheduling
• Jobs triggered periodically (Cron, Quartz, Etc...)
• Batch Processing

Content Delivery
Network (CDN)

CDN
• Static Content
• Binary (Video,Audio, Etc...)
• Web objects (HTML, Javascript, CSS, Etc...)
• Do not serve through you application server
• Use a CDN
• “A large distributed system of server deployed in
multiple data centers across the internet”
• Akamai
• AWS CloudFront

CDN Gotchas
• Versioning and caching
• Assume that you a script file named
script.js deployed on a CDN
• Copies of the file script.js will be
replicated across all edge nodes
• Clients will cache copies of the script file
script.js as well in their local cache

CDN Gotchas
• When script.js is updated sharing the same URI
with the old version
• The new content is NOT propagated across
the edge nodes
• New clients end up being served with the
old version, now dirty state
• Old clients continue to use their local cache
containing the old version, now dirty state

CDN Gotchas
• What to do?
• Simply append version numbers to file
names
• script-v1.js, script-v2.js, Etc...
• Force invalidation of the file on edge nodes
• Set HTTP caching headers properly

DNS
• DNS
• Do not rely on your free domain name registrar
DNS services
• Use a scalable DNS solution
• AWS Route 53
• DynECT
• UltraDNS
• Etc...

QuantifyingQuantifying
ScalabilityScalability
ScalabilityScalability

Quantifying Scalability
• Instrumentation
• Bake it into the code early
• Monitoring
• Application health
• Cluster
• Individual node
• System resources
• JVM
• Track Key Performance Indicators (KPIs)
• Number of request handled
• Throughput
• Latency
• Apdex Index
• Etc ...
• Logs
• Testing
• Load/Stress testing

DisasterDisaster
RecoveryRecovery
RecoveryRecovery

When disaster hits...• Goal:
• Fault tolerant system
• If case of disaster, recover and restore service ASAP
• Be proactive
• Develop a Disaster Recovery Plan (DRP)
• Test DRP in failure drills

ScalingScaling
TeamsTeams
TeamsTeams

Scaling Teams
• Hiring
• Always hire top talent
• You are as strong as your weakest link
• Develop a process to bring people in
• Turnkey Hardware/Software Set up (Tools likeVagrant, etc...)
• Arrange for proper access/accounts
• Develop a knowledge base (Architecture documentation, FAQs, etc...)
• Development Process
• Be Agile
• Refine in the spirit of Six Sigma

Scaling Teams
• Teams
• Form small ad-hoc teams from pools of Agile breeds
• Product Owners
• Team Members
• Team Lead (Scrum Master)
• Engineers
• QAs
• Architecture Owners
• Keep them small
• Give them ownership of their DevOps

The Take-home Message
• The early-bird gets the worm
• Design to scale from day one
• Plan for capacity early
• Your needs determine how scalable is scalable
• Do not over-engineer
• Do not bite more than you can chew
• Building scalable system is process
• Commit to a road map around bottlenecks
• Guided by planned business features
• Learn from others’ experiences (Twitter, Netflix, etc...)

Take it slow...You’ll get there...
• Work smarter not harder

Thank YouThank You
http://speakerscore.com/jaxconf-scalabilityhttp://speakerscore.com/jaxconf-scalability
@PolymathicCoder@PolymathicCoder

The Economies of Scaling Software

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a The Economies of Scaling Software

Semelhante a The Economies of Scaling Software (20)

Mais de Abdelmonaim Remani

Mais de Abdelmonaim Remani (7)

Último

Último (20)

The Economies of Scaling Software