2. Goals
Common Sense in the Cloud
same as outside the cloud
1. Tune performance
2. Investigate issues
3. Visualize architecture
3. Nick Gerner
www.nickgerner.com
@gerner
⢠Formerly senior engineer at SEOmoz
⢠Linkscape: index of the web for SEO
⢠Lead data services
⢠Developer
⢠Back-end ops guy
4. SEOmoz
⢠Seattle-based Startup (~7 engineers)
⢠SEO Blog and Community
⢠Toolset and Platform
OpenSiteExplorer.org
⢠300TB/month processing pipeline
⢠5 mil req/day API hits
5. SEOmoz Engineering
⢠50 < nodes < 500
⢠AWS based since 2008
â EC2 â linux root access to bare VM
â S3 â networked disk
â EBS â local disk I/O
â ELB â load balancing as a service
6. SEOmoz Architecture
Processing
The Raw
Web Crawlers
Crawlers
Storage
Process Prepare
Data Pipeline
9. Great
...but not the focus of this talk
Latency Conversion
Rate
DNS
Time to
On-load
Web
Object
Count
10. Performance Indicators
System App
Characteristics Stack
Front-End
CPU Mem Drives Middleware
Caching
Net
Disk Competes Back-end
For
Database WS-API
http://www.flickr.com/photos/dnisbet/3118888630/
11. Performance Indicators
System
Characteristics App
Stack
CPU Mem Front-End
Drives Middleware
Caching
Competes
For
Back-end
Net
Disk Database WS-API
http://www.flickr.com/photos/dnisbet/3118888630/
14. Load Average
⢠Combines a few things
⢠Good place to start
⢠Explains nothing
http://www.flickr.com/photos/maple03/4176389418/
15. CPU
⢠Break out by process
⢠Break out user vs system
⢠User, System, I/O wait, Idle
http://www.flickr.com/photos/pacdog/213442876/
16. Why watch it?
⢠Who's doing work
⢠Is CPU maxed?
⢠Blocked on I/O?
⢠Compare to Load Average
http://www.flickr.com/photos/pacdog/213442876/
17. Memory
⢠Break out by Process
⢠Free, cached, used
http://www.flickr.com/photos/williamhook/3118248600/
18. Why watch it?
⢠Cached + Free = Available
⢠Do you have spare memory?
â App uses
â Memcache
â DB cache
http://www.flickr.com/photos/williamhook/3118248600/
19. Disk
⢠Read bytes/sec
⢠Write bytes/sec
⢠Disk utilization
http://www.flickr.com/photos/robfon/2174992215/
20. Why watch it?
⢠Is disk busy?
⢠When?
⢠Who's using it?
http://www.flickr.com/photos/robfon/2174992215/
22. Why watch it?
⢠Max connections
(~1024 is magic)
⢠Bandwidth is $$$
⢠When are you busy?
⢠SOA considerations http://www.flickr.com/photos/ahkitj/20853609/
23. v Perf Monitoring Solution
FREE, in Apt
1. data collection (collectd)
2. data storage (rrdtool)
3. dashboard management (drraw)
25. Perf Monitoring Architecture
collectd agents
new nodes get
Cluster generic config
Cluster node names
follow convention
according to role
26. Perf Monitoring Architecture
On its own server:
collectd server
Perf Monitoring Web server
drraw.cgi
Server
allows connections
from new nodes
perf data backed up daily
Cluster
Cluster
27. Perf Monitoring Architecture
Happy Sysadmin
Visibility into system
history of performance
Perf Monitoring
Server
Cluster
Cluster
28. Perf Dashboard Featurs
1. Summarize nodes/systems
2. Visualize data over time
3. Stack measurements
â Per-process
â Per-node
4. Handle new nodes
â
39. Graph Summary
⢠cpu, mem, disk, net
⢠over time
⢠per node
⢠per process
⢠Through in relevant app measures
e.g. per request stats:
⢠req/sec
⢠median latency/req