Production Profiling: What, Why and How You want to understand what an application is doing in production, but this information is often invisible. Profilers tell you what code your application is running but few developers profile and mostly on their development environments. Production profiling is now a practical reality that can help avoid performance problems
5. Opsian
Amazon: 100ms of latency costs 1% of sales
Google: 500ms seconds in search page generation time drops traffic by 20%
Responsive Applications make more Money
12. Opsian
Virtualised environments can perform very differently
“Two frequently used system calls are
~77% slower on AWS EC2”
https://blog.packagecloud.io/eng/2017/03/08/system-calls-are-much-sl
ower-on-ec2/
18. Opsian
Coarse Grained Instrumentation
Measures time within some instrumented section of the code
Time spent inside the controller layer of your web-app or performing SQL queries
More detailed and actionable though expensive
19. Opsian
Logging
Records arbitrary events emitted by the system being monitored
log4j/slf4j/logback
Logs of GC events
Often manual, aids system understanding, expensive
20. Opsian
Production Profiling
Measures what part of your code is eating up some resource
What methods take up CPU or wallclock time?
What lines of code allocate the most objects?
Where are your CPU Cache misses coming from?
Automatic, can be cheap but often isn’t
23. Opsian
Normalisation of Deviance
“Some of the tests always fail, so we just ignore them.”
“Some of the alerts get triggered regularly, so we just ignore them.”
Alert false positives have a cost
25. Opsian
Surely a better way?
Diagnostics not Diagnosis
Are there alternatives to the instrumentation dance?
Can’t just rely on metrics - you need actionable insights for your app
What about profiling?
31. Opsian
Advanced Statistical Profiling in Java
OS Signals to interrupt threads on resource consumption threshold
JVM’s signal handler-safe AsyncGetCallTrace to walk the stack
32. Opsian
Open Source Java Profilers
High Overhead
VisualVM
hprof
Twitter’s CPUProfile
Anything GetAllStackTraces based
Low Overhead
Async Profiler
Honest Profiler
Java Mission Control
33. Opsian
Profiling Support in the Linux Kernel
perf and eBPF
perf-map-agent for the JVM
Hardware events (L1/L2/L3 cache misses, branch mispredictions, etc.)
35. Opsian
Barriers to Ad-Hoc Production Profiling
Low overhead unbiased tools are
relatively new
Generally requires access to
production
Process involves manual work - hard
to automate
39. Opsian
Putting Samples in Context
Application version
Environment parameters (machine type, CPU, location, etc.)
Desktop profilers don’t do this
40. Opsian
Summary
Production Profiling is possible - all the time with low overhead
Production Profiling is desirable - can give deep insights
Problems can be overcome