LOPSA SD 2014.03.27 Presentation on Linux Performance Analysis
An introduction using the USE method and showing how several tools fit into those resource evaluations.
2. Me
⢠Systems Architect
⢠Sony Network Entertainment
⢠18 years running stuff
⢠Majority of the last 14 years: medium-large Internet
services
3. Read this bookâŚ
And look here:
http://www.brendangregg.com/
http://www.brendangregg.com/
methodology.html
http://www.brendangregg.com/Slides/
LISA2012_methodologies.pdf
http://www.amazon.com/Systems-Performance-Enterprise-Brendan-Gregg/dp/0133390098
4. The website is down!!!
Itâs just too slow!
The DB is too slow!
The disk is too slow!
SLOW!!!
http://farm4.staticďŹickr.com/3190/2976755407_6a6a574596_o.jpg
5. SLOW!!!!
⢠What does slow mean
anyways?
⢠Is it not transferring fast
enough?
⢠Is it handling (not) too many
requests?
http://commons.wikimedia.org/wiki/File:United_States_sign_-_Slow_TrafďŹc_Ahead.svg
6. Slow can meanâŚ
⢠Latency: How long it takes
⢠ms, s, request time, etc
⢠Throughput: How much can
happen at the same time
⢠bandwidth, IOPS, rps, tps,
etc
http://upload.wikimedia.org/wikipedia/commons/2/2e/Miniature_DNF_Dictionary_055_ubt.JPG
7. Slowness comes fromâŚ
⢠Full utilization of a resource
⢠Waiting in a saturated queue
⢠Generated errors!
!
⢠The USE Method
http://farm6.staticďŹickr.com/5181/5614813544_a30d693a50_o.jpg
8. Utilization
⢠You have fully used up whatâs
been allocated
⢠aka 5 lb bag
http://farm3.staticďŹickr.com/2524/4000641774_3331fe06fb_o.jpg
9. Saturation
⢠Waiting for someone else to
get done so you can do yours
⢠Typically because a resource
is fully utilized, but not
necessarily directly
http://www.fotocommunity.com/pc/pc/display/30396619
10. Errors
⢠Dropped packets
⢠Incorrect responses
⢠Deadlocks
⢠Timeouts
!
⢠Not all failures fail fast
http://farm8.staticďŹickr.com/7001/6509400855_aaaf915871_b.jpg
11. How do we determine?
⢠Different types of tools for
different examinations
⢠Depends on what youâre
looking for (which can be a
problem in and of itself)
http://farm5.staticďŹickr.com/4083/5086955738_61f6455ace_b.jpg
12. Resource vs Transaction
⢠Do you care ifâŚ
⢠a CPU is maxed out?
⢠processes are blocked?
⢠packets are lost?
⢠or ifâŚ
⢠a userâs request fails?
⢠a user gives up on waiting for a response?
13. Maturity
⢠Tracing tools, especially using
in production, requires a level
of maturity
⢠Iâm not that mature⌠;)
⢠No, really just focusing on the
basics ďŹrst
http://upload.wikimedia.org/wikipedia/commons/b/bd/OFLC_large_R18%2B.svg
47. Running out of Apache
Threads
⢠Lots of incoming requests
⢠Apache hits ServerLimit of
threads (Utilization!)
⢠Requests start to get stuck in
TCP backlog (Saturation!)
⢠Apache endpoints are
removed from load balancers
(Error!)
⢠Fail!
http://upload.wikimedia.org/wikipedia/commons/9/96/Colorful_Threads_(3965274345).jpg
48. Cold DB Start
⢠DBâs like to be in memory, but
canât start that way
⢠All data requests go to disk
(which is SAN backed)
⢠SAN controller CPU gets
maxed out (Utilization!)
⢠HBA queues get deep
(Saturation!)
⢠Requests timeout (Error!)
⢠Fail!
50. Methods > Tools
⢠Donât let tools get in the way of
solutions
⢠Itâs easy to think that all your
missing a tool.
⢠But are you actually following
a method to your performance
madness?
http://upload.wikimedia.org/wikipedia/commons/6/6d/Three_Card_Monte.jpg
51. Anti-Methods
⢠Blame Someone Else
⢠Streetlight
⢠Drunk Man
⢠Random Change
⢠Passive Benchmark
!
⢠Donât do theseâŚ
http://www.brendangregg.com/methodology.html http://upload.wikimedia.org/wikipedia/commons/a/af/Villainc.svg
52. Methods
⢠Ad Hoc Checklist
⢠Problem Statement
⢠ScientiďŹc
⢠Workload Characterization
⢠Drill-down Analysis
⢠By-layer
⢠Latency Analysis
⢠Tools
⢠Stack ProďŹle
⢠Off-CPU Analysis
⢠Thread State Analysis
⢠Active Benchmark
http://www.brendangregg.com/methodology.html http://memegenerator.net/instance/9192015