4. 4
Understanding- 1. Scaling
What is Scalable?
Scalable means Expandable. Referring to hardware or
software or application that can handle a large increase in
users, workload or transactions without undue strain. Scalability
is a system's ability to process more workload.
What is scalability not ?
Raw Speed / Performance
HA / BCP
Technology X
Protocol Y
5. 5
Understanding- 1. Scaling
Objectives of scalability
Traffic growth
Dataset growth
Maintainability
Two kinds of scalability:
Vertical (get bigger)
Horizontal (get more)
7. 7
Understanding-2.Architecture
App servers scale in two ways:
Really well
Quite badly
Sessions :
(State)
Local sessions == bad
When they move == quite bad
Centralized sessions == good
No sessions at all == awesome!
8. 8
Understanding-2.Architecture
Super slim sessions Properties:
If you need more than the cookie (login status, user id,
username), then pull their account row from the DB
Or from the account cache
None of the drawbacks of sessions
Avoids the overhead of a query per page
Great for high-volume pages which need little
personalization
Turns out you can stick quite a lot in a cookie too
Pack with base64 and it’s easy to delimit fields
10. 10
Understanding-2.Architecture
Scaling the web app server part is easy
The rest is the trickier part
Database
Serving static content
Storing static content
Other services scale similarly to web apps
That is, horizontally
The canonical examples:
Image conversion
Audio trans coding
Video trans coding
Web crawling
Compute
11. Hardware Load balance:
A hardware appliance
Often a pair with heartbeats for HA
Expensive!
But offers high performance
Many brands
Alteon, Cisco, Netscalar, Foundry, etc
L7 - web switches, content switches, etc
11
Understanding-3. Load Balancing
12. Software Load balance:
Lots of options
Pound
Perlbal
Apache with mod_proxy
Wackamole with mod_backhand
12
Understanding-3. Load Balancing
Wakamole
13. Parallelizable:
If we can transcode/crawl in parallel, it’s easy
But think about queuing
And asynchronous systems
The web ain’t built for slow things
But still, a simple problem
13
Understanding-4. Queuing
15. 15
Understanding-5. Relational Data
Databases:
Unless we’re doing a lot of file serving, the database is the
toughest part to scale
If we can, best to avoid the issue altogether and just buy bigger
hardware
Dual Opteron/Intel64 systems with 16+GB of RAM can get you
a long way
Read Power of Apps:
Web apps typically have a read/write ratio of somewhere
between 80/20 and 90/10
If we can scale read capacity, we can solve a lot of situations
Database replication!
17. 17
Understanding-6. Caching
Caching avoids needing to scale!
Or makes it cheaper
Simple stuff
mod_perl / shared memory
Invalidation is hard
Database query cache
Bad performance (in most cases)
Getting more complicated…
Write-through cache
Write-back cache
Sideline cache
19. 19
Understanding- 7.HA Data
The key to HA is avoiding SPOFs
Identify
Eliminate
Some stuff is hard to solve
Fix it further down the tree
Dual DCs solves Router/Switch SPOF
20. 20
Understanding-7. HA Data
Master-Master:
Either hot/warm or hot/hot
Writes can go to either
But avoid collisions
No auto-inc columns for hot/hot
Bad for hot/warm too
Unless you have DB
But you can’t rely on the ordering!
Design schema/access to avoid collisions
Hashing users to servers
21. 21
Understanding-7. HA Data
Ring:
Master-master is just a small ring
With 2 nodes
Bigger rings are possible
But not a mesh!
Each slave may only have a single master
Unless you build some kind of manual replication
22. 22
Understanding-7. HA Data
Dual Tree:
Master-master is good for HA
But we can’t scale out the reads (or writes!)
We often need to combine the read scaling with HA
We can simply combine the two models
23. 23
Understanding-8. Federation
Data Federation:
At some point, you need more writes
This is tough
Each cluster of servers has limited write capacity
Just add more clusters
Split up large tables, organized by some primary object
Usually users
Put all of a user’s data on one ‘cluster’
Or shard, or cell
Have one central cluster for lookups
24. 24
Understanding-8. Federation
Data Federation:
Need more capacity
Just add shards!
Don’t assign to shards based on userid!
For resource leveling as time goes on, we want to be able to move objects
between shards
Maybe – not everyone does this
‘Lockable’ objects
25. 25
Understanding-8. Federation
Data Federation:
Need more capacity?
Just add shards!
Don’t assign to shards based on userid!
For resource leveling as time goes on, we want to be able to move objects
between shards
Maybe – not everyone does this
‘Lockable’ objects
26. 26
Understanding-8. Federation
Data Federation:
Heterogeneous hardware is fine
Just give a larger/smaller proportion of objects depending on hardware
Bigger/faster hardware for paying users
A common approach
Can also allocate faster app servers via magic cookies at the LB
27. 27
Understanding-8. Federation
Simple things first:
Vertical partitioning
Divide tables into sets that never get joined
Split these sets onto different server clusters
Logical limits
When you run out of non-joining groups
When a single table grows too large
28. 28
Understanding-9. Multi-site
HA
Multiple Data centers:
Having multiple datacenters is hard
Not just with one DB
Hot/warm with DB slaved setup
But manual (reconfig on failure)
Hot/hot with master-master
But dangerous (each site has a SPOF)
Hot/hot with sync/async manual replication
But tough (big engineering task)
29. 29
Understanding-10. File Servings
Serving lots of files is not too tough
Just buy lots of machines and load balance!
We’re IO bound – need more spindles!
But keeping many copies of data in sync is hard
And sometimes we have other per-request overhead (like
auth.)
Reverse Proxy
30. 30
Understanding-10. File Servings
Serving out of memory is fast!
And our caching proxies can have disks too
Fast or otherwise
More spindles is better
We stay in sync automatically
We can parallelize it!
50 cache servers gives us 50 times the serving rate of the origin server
Assuming the working set is small enough to fit in memory in the cache
cluster
Reverse Proxy
31. 31
Understanding-10. File Servings
Invalidation:
Dealing with invalidation is tricky
We can prod the cache servers directly to clear stuff out
Scales badly – need to clear asset from every server –
doesn’t work well for 100 caches
We can change the URLs of modified resources
And let the old ones drop out cache naturally
Or prod them out, for sensitive data
Good approach!
Avoids browser cache staleness
Hello Akamai (and other CDNs)
32. 32
Understanding-10. File Servings
Perlbal backhanding:
Perlbal can do redirection magic
Client sends request to Perbal
Perlbl plugin verifies user credentials
token, cookies, whatever
tokens avoid data-store access
Perlbal goes to pick up the file from elsewhere
Transparent to user
33. 33
Understanding-10. File Servings
Permission URLs:
If we bake the auth into the URL then it saves the auth step
We can do the auth on the web app servers when creating HTML
Just need some magic to translate to paths
We don’t want paths to be guessable
34. 34
Understanding-11. Storing File
Storing files is easy!
Get a big disk
Get a bigger disk
Horizontal scaling is the key
Again
Connecting to storage:
NFS
Stateful == Sucks
Hard mounts vs Soft mounts, INTR
SMB / CIFS / Samba
Turn off MSRPC & WINS (NetBOIS NS)
Stateful but degrades gracefully
HTTP
Stateless == Yay!
Just use Apache
35. 35
Understanding-11. Storing File
Multiple volumes:
Volumes are limited in total size
Except (in theory) under ZFS & others
Sometimes we need multiple volumes for performance reasons
When using RAID with single/dual parity
At some point, we need multiple volumes
36. 36
Understanding-11. Storing File
HA Storage:
HA is important for assets too
We can back stuff up
But we tend to want hot redundancy
RAID is good
RAID 5 is cheap, RAID 10 is fast
But whole machines can fail
So we stick assets on multiple machines
In this case, we can ignore RAID
In failure case, we serve from alternative source
But need to weigh up the rebuild time and effort against the risk
Store more than 2 copies.
41. 41
Application Base Architecture
Contain:
1. Why n-tier?
2. Layers.
3. Monolithic or 1-tier architecture.
4. 2-tier architecture.
5. 3-tier architecture.
6. Need of MVC.
7. MVC architecture.
8. MVC components.
9. Comparison between MVC and 3-tier.
42. 42
Application Base Architecture
1. Why n-tier:
Need of e-commerce solutions; increase in users and merchant sites all over the
world.
Applications should be scalable, user-friendly, have tight security and easily
maintainable.
2. Layer:
Layer means logical.
Tier means physical.
Generally there are three layers:-
Presentation
Business
Data access layer
43. 43
Application Base Architecture
Layers Presentation:
Presentation Layer:
involves with client and application interaction. Provides user friendly interface for
the clients.
Business Layer:
contains the application code or the core functionalities of the application or
what the application will perform.
Data access Layer:
involves with the maintaining database and storage of data.
44. 44
Application Base Architecture
3. Monolithic or 1-tier:
1.Presentation layer, Business layer and Data Access layer are tightly connected. As the layers are tightly
connected(depends on each other), if any of the layer code are changed then other layers should be
affected and the codes in other layers need to be changed.
2.Traditional approaches of the applications are based on this type of architecture. Typically implementation
of 1-tier architecture can be a C program running in a computer.
4. 2-tier:
1.In this type of architectures the presentation layer, the business logic layer are separated from the data
access layer.
2.The advantages of this layer is that the code of the data access layer can be changed any time without
affecting the code of the other layer i.e. the whole database and the layer can be changed anytime.
3.The database(i.e. the data access layer) can be present anywhere around but the other two layers should
be together(tightly connected).
4.As the presentation and the business logic are still connected they should be present at the client side to
work together; due to the concentration of the client this type of client is called thick client.
5.Problems faced by this type of architecture is the client should always get the updated copies of the
application if there is a change in the application codes or application developer modifies the application.
6.The application developer may not want to give the code to the relatively third parities even if the code is
pre compiled
46. 46
Application Base Architecture
5. 3-tier:
1.In this type of architecture the presentation layer, the business logic layer and the data access layer are
separated from each other and are present on three different tiers therefore they are loosely connected.
2.The main advantages is that any change in the code in one layer will not affect the other layers and the
platform can also be changed independently.
3.Now the web designer can concentrate on the design of the user interface i.e. the presentation logic, the
application developer concentrate on developing the application i.e. the business logic and the database
manager can handle the database independently.
4.Today’s application are based on 3-tier architecture which are scalable, easy to maintain, components are
reusable and faster development
47. 47
Application Base Architecture
6. Need Of MVC Architecture :
Need to access the data from different context such as mobiles, touch screen, desktop, etc.
Need to show the data in different context such as mobiles, touch screen, desktop, etc.
Need to show multiple views of the same data such as Thumbnails, List or details.
Need to change the designs without changing the core business logic.
7. MVC Solution:
Separate the core business logic form the presentation logic.
Separate views for the same data.
49. 49
Application Base Architecture
8. MVC Component:
Model: It contains the core functionalities and the business logic of the application. It accepts the state query
from the model and controller and it provides the updated information to the view component.
View: This component is responsible for the presentation logic and the user interaction between the
application. The model provides different information to different user which can be represented in different
ways. The main work of the view component is to provide the suitable information to the user.
Controller: It accepts the user input through the view component and process them and if any changes are
required then it perform the changes after that it response to the client
9. MVC Vs. 3-tier:
1. In MVC architecture the components communicate directly with each other in order to maintain a coherent
user interaction but in case of 3-tier the presentation layer(front end) communicates with the data access
layer(back end) through the business layer(middleware).
2. In 3-tier the Layers are present on three different tiers or machines where as in MVC the layers are
present on single tier or machines.