1. Java black box profiling
Alexey Ragozin
alexey.ragozin@gmail.com
2. The Problem
• There are application
• It doesn’t meat its SLA
• Users/client/stakeholders are unhappy
You are one to fix it!
3. What to do?
• Do not panic!
• Write out all moving parts in system
• Elaborate acceptance criteria
• Make sure you understand KPI and SLA
• Is problem in Java application?
Yes - You are ready to start profiling
5. Understanding targets
• Business transaction ≠ Page
• Business transaction ≠ SQL transaction
• Page ≠ HTTP request
• HTTP request ≠ SQL transaction
Make sure you know how your KPI
are translating into milliseconds
6. Just be before you start profiling
Take a deep breath
Make sure problem is in Java part
Make sure you understand end goal
These points are trivial so it is very easy to forget
them under stress.
Do not make that mistake!
7. Few more rule before start
Every system is a black box
Do you think that know what is inside?
You don’t!
There 3 kinds of data produced by profiling
Lies
Darn lies
Statistics
Data incorrectly interpreted
Data incorrectly collected
9. Reproducing performance issue
Stationary process
• Uniform load reasonable simulating real one
• Goal is optimizing average performance indicators
Transition process
• Goal is optimize prolonged process
• Load profile is changing over time systematically
Profiling single operation
• Goal is to optimize particular operation
11. Stack trace sampling
• Work best with stationary process
• Uniformly impacts system under test
• Do not require “calibrating”
• Result from multiple runs could be combined
• You deal with probabilities, not hard numbers
• Measure wall clock, not CPU time
12. Byte code instrumentation
• Useful for all types of experiments
• Produce some absolute numbers
Number of calls is accurate
Time is predictably skewed
• Significantlyskewperformance ofsystemundertest
• Require careful targeting
• Could measure CPU time
13. Threading MBean
Standard JVM Threading MBean
CPU usage per thread (user/sys)
Memory allocation per thread
Blocking / waiting statistics
15. Profiling strategy
Classify your bottleneck
CPU
• Application CPU usage
• Excessive memory allocation / GC
Disk IO
Network latency
Network bandwidth
Inter thread contention
16. Recipes: CPU hogs
1. You sampling to identify suspects
• Frame frequency histogram + meditation
• Call tree + deep meditation
• Iterative classification
2. Instrument methods to get hard prove
• relative number of calls between methods
3. Iterate to pin point root cause
19. Recipes: Memory issues
1. Indentify thread producing garbage
• SJK: ttop command
2. Classify garbage
• Class histogram of dead object in heap
• SJK: hh --dead
3. Investigate suspect classes
• Heap dump
• Debugging / Instrumentation
20. Recipes: contention
• Stack trace sampling
Good for “bad” contention cases
• Thread MBean
Contention statistics could be enabled
• BTrace
Analyze lock access pattern using your
codebase knowledge
21. Sub millisecond profiling
Low latency profiling requires
indirect measuring.
You should not measure sub ms directly!
• Make a hypothesis
• Arrange experiment to prove / disprove
• Collect aggregated metrics
• Compare, analyze and iterate
22. Flight record / Mission Control
Tight integration with JVM
• Low overhead
• Access to intimate JVM areas
Well balanced set of metrics
• Unobtrusive
• Indicative
Good user interface
May be one day I will abandon my SJK