2. Abstract
Adam Brodziak
An overview of modern web-based application
architecture - from hardware infrastructure,
through PHP/SQL code, HTML/CSS markup
distribution. All of this spiced up by cache,
loadbalancing and CDN.
3. Who is this guy?
Lead developer at Global Sports Media
GSM collects and process sports data
GSM owns soccerway.com portal
Linux user
Interested in frameworks, design patterns
Semantic Web enthousiast
Football (soccer) fan
6. Raw numbers
7 millions visits / month
52 millions pageviews / month
1 billion request / month
6TB of traffic / month
300k users at peak time
Quite a few clients using the same hardware
7. Not so much, but...
700 leagues
Livescores
Game events
Match statistics
Rankings
Editorials
9. The Challenge
Loads of data to process
Scores
Events
Stats
In real-time (livescores)
Growing number of visitors
13K hits/sec at peak-time
17. Replicaton caveats
Writes only on master
Reads from slaves
Data consistency
Replication lag
Don't do
$master->query('UPDATE session SET logged = 1');
$slave->query('SELECT logged FROM session');
20. PHP is slow!
Yes, but it does not matter!
Database access is slower
Cache over network is slower
Disk access is slower
HTTP requests are slower
Webservice calls are slower
Discover bottlenecks before blaming PHP
21. It's about architecture
Heavy tasks in background
CRON, Gearman
Pregenerate stuff
Move some code to SQL
Calculations in queries
Stored procedures
Triggers
C/C++ or Java for heavy computation
Use PHP to glue it together
23. Framework? Think again!
Raw performance matters
Support for master-slave replication
Multiple layers of cache
Working with accelerators (HipHop!)
Beware of bottlenecks
i.e. core part of framework is slow
Designed to scale
27. Memory is cheap
Pre-generate stuff
Store results in memory
APC, memcached
App config in memory
APC with stat=off
Increase RAM for MySQL
Disk is the new tape
28. Memcached for the rescue!
Dead simple
Key-value
Distributed storage pool
Automatic invalidation after X sec
No garbage collecting invoked
Store arrays, objects, simple values
Easy integration
30. Reverse-proxy
First line of cache
Returns content if resource is up-to-date
Works on HTTP level
Can be integrated into existing infrastructure
Can do load balancing
In-memory cache storage
Squid, Nginx, Varnish
31. Content Delivery Network
Network of servers
Worldwide
Automatic loadbalancing
Fast access (low ping time)
Data redundancy gratis
Ideal for static resources
But not only
Must-have for worldwide websites
32. CDN as reverse-proxy
HTTP request / response chain
Embraces REST architecture
Requests are distributed
Reduces latency
Lowers traffic volume
Increases availability
i.e. Akamai Edge Suite
33. CDN at soccerway.com
All of the content is served via CDN
Images, CSS, JS
Generated HTML
JSON for Ajax
90% of traffic via CDN
Origin requests only from Europe
Site online even if servers are down
Can't live without ;)