SlideShare a Scribd company logo
1 of 75
Download to read offline
Hotspot Garbage Collection
Tuning Guide
http://www.jclarity.com
1Thursday, 2 May 13
Who are we?
• Martijn Verburg (@karianna)
– CTO of jClarity
– aka "The Diabolical Developer"
– co-leader of the LJC
• Dr. John Oliver (@johno_oliver)
– Research Mentat at jClarity
• Strange title? Yes we're a start-up
– Can read raw GC log files
• "Empirical Science Wins"
2Thursday, 2 May 13
What we're going to cover
• Part I - Shining a light into the Darkness
– Retrospective from Talk I
– Collector Flags Ahoy
– Tooling and Basic Data
• Part II - Setting the stage
– When to tune GC
– Pause times vs Throughput vs Heap Size
– Application Lifecycle
• Part III - Real World Scenarios
– Possible Memory Leak(s), Long Pause Times
– Premature Promotion, System GCs, Low Throughput
– Healthy Application, Maxed Allocation Rate
3Thursday, 2 May 13
What we're not covering
• G1 Collector
– It's supported in production now
– But we doubt any of you are using it yet
• Non Hotspot JVMs
– Again, most of you are using OpenJDK/Oracle.
– Azul's Zing VM is a specialist VM you can look at
4Thursday, 2 May 13
Part I - Shining a light into the dark
• Retrospective
• Collector Flags ahoy
• Reading CMS Log records
• Tooling and basic data
5Thursday, 2 May 13
Java Heap Layout
Copyright - Oracle Corporation
6Thursday, 2 May 13
Weak Generational Hypothesis
Copyright - Oracle Corporation
7Thursday, 2 May 13
Copy Collectors
• aka "stop-and-copy"
– Some literature will discuss "Cheney's algorithm"
• Used in many managed runtimes
– Including Hotspot
• GC thread(s) trace from root(s) to find live objects
• Typically involves copying live objects
– From one space to another space in memory
– The result typically looks like a move as opposed to a copy
8Thursday, 2 May 13
Mark and Sweep Collectors
• Used by many modern collectors
– Including Hotspot, usually for old generational collection
• Typically 2 mandatory and 1 optional step(s)
1.Find live objects (mark)
2.'Delete' dead objects (sweep)
3.Tidy up - optional (compact)
9Thursday, 2 May 13
More Flags than your Deity
Copyright Frank Pavageau
10Thursday, 2 May 13
'Mandatory' Flags
• -Xloggc:<pathtofile>
– Path to the log output, make sure you've got disk space!
• -XX:+PrintGCDetails
– Minimum information for tools to help
– Replace -verbose:gc with this
• -XX:+PrintTenuringDistribution
– Premature promotion information
11Thursday, 2 May 13
Basic Heap Sizing Flags
• -Xms<size>
– Set the minimum size reserved for the heap
• -Xmx<size>
– Set the maximum size reserved for the heap
• -XX:MaxPermSize=<size>
– Set the maximum size of your perm gen
– Good for Spring apps and App servers
• We'll cover other flags in a tuning context
12Thursday, 2 May 13
Beware of Magic Happening
• When you touch GC Flags a Puppy dies
• Your Tenuring Threshold jumps to 15
• -XX:MaxTenuringThreshold=n
– To reset this to what you really want
13Thursday, 2 May 13
Tooling
• HPJMeter (Google it)
– Solid, but no longer supported / enhanced
• GCViewer (http://www.tagtraum.com/gcviewer.html)
– Has rudimentary G1 support
• GarbageCat (http://code.google.com/a/eclipselabs.org/p/garbagecat/)
– Best name
• IBM GCMV (http://www.ibm.com/developerworks/java/jdk/tools/gcmv/)
– J9 support
• jClarity Censum (http://www.jclarity.com/products/censum)
– The prettiest and most useful, but we're biased!
14Thursday, 2 May 13
Don't listen to the vendors ;-)
• Single log with consistent format?
– You can probably grep for stuff
– This doesn't scale
• Existing free tools are adequate*
– *For older JVMs especially
– Most are no longer actively maintained
• Latest tooling does more for you
– Supports Latest JVMs & Collectors
– Has more meaningful visualisations
– Starts to do some of the Human analysis for you
– Correlates and performs historical analysis
– Parses certain data out that the others don't
15Thursday, 2 May 13
Summary Data
16Thursday, 2 May 13
Heap Usage After GC
17Thursday, 2 May 13
Recovered Heap
18Thursday, 2 May 13
Allocation Rates
19Thursday, 2 May 13
Pause Times
20Thursday, 2 May 13
Perm Space
21Thursday, 2 May 13
Tenuring Threshold
22Thursday, 2 May 13
Part II - Setting the stage
• When to Tune
• Latency / Throughput / Footprint
– aka Performance goals
• Application Lifecycle
• Know your Hardware
23Thursday, 2 May 13
When to tune GC
• As part of a performance diagnostic process
– After looking machine metrics
– Before execution profiler
• It's cheap to switch on GC flags
– It's cheap to eliminate or pin issue on GC
– It's not cheap to setup execution profilers
• Result is either "GC is OK" or "GC is not OK"
– Tune the GC and/or
– Bring out the memory profiler
24Thursday, 2 May 13
Latency vs Throughput vs Footprint
• aka performance goals:
– e.g. "Max Pause Times / 95th% Pause Times" vs
– "Object Allocation Rate" vs
– "Heap Size"
– Throughput ~= % of time doing application work
• Tuning tradeoff
– Latency x Throughput x Footprint = Z
– You can typically tune for 2/3 of these
– To increase Z you need to
• increase allocated hardware OR
• Rewrite your app
• Decide what characteristics you want!
– Before tuning
25Thursday, 2 May 13
Latency vs Throughput vs Footprint
• Better Throughput
– Usually means worse Latency and Footprint
• Better Latency
– Usually means worse Throughput
• Better footprint
– Usually means worse Throughput
26Thursday, 2 May 13
Application Lifecycle
• Very little point in tuning based off limited information
– Have you gathered enough data
– Has your application gone through it's typical lifecycle?
– This is why we don't run 'Live Demos'
• Very little point in tuning off incorrect information
– Application start-up, shutdown and batch jobs are all outliers
• You can infer amazing things from GC logs
– When Richard went to lunch
– When John stopped playing Minecraft
– When Ben kicked off the weekly customer report
– .....
27Thursday, 2 May 13
Know your Hardware
• Number of CPU cores, matters
– Allocate X threads to do GC work with a concurrent collector
– How many is 'safe'?
– How does that affect throughput?
• Memory Bandwidth, matters
– How quickly can your hardware allocate?
– See your manufacturer
– Object Allocation Rates != Memory Bandwidth != Real Metric
• Use Hawkshaw to explore your hardware
– Produces GC behaviour according to statistical models
– http://www.github.com/jclarity/hawkshaw
28Thursday, 2 May 13
Part III - Tuning Scenarios
• Tuning can make it worse!
• Grain of Salt
• Scenarios
– Possible Memory Leak(s)
– Long Pause Times
– Premature Promotion
– System GCs
– Low Throughput
– Healthy Application
– Maxed Allocation Rate
29Thursday, 2 May 13
Tuning can make it worse*
• Performance Tuning is an iterative process
– Sometimes solving one problem uncovers a 2nd worse
problem
– e.g. Fix the app, then the database gets hammered
• Overall performance goes down
• Only fix one aspect of GC at a time
– Measure the next cycle with fresh eyes
– Have you met your goals or made them worse?
• GC tuning still needs human interaction
– Azul's Zing can/will claim otherwise.
30Thursday, 2 May 13
Grain of Salt
"Nothing that we say should be held as
performance tuning tips for *your* application"
"There is *always* more than one way to tune in order to
meet your goal"
"Don't just use our numbers!"
31Thursday, 2 May 13
A Likely Memory Leak
• Memory leaks can't truly be ascertained by a GC log
– It could just be an undersized heap!
– Needs Human domain knowledge of app (periodicity)
• First rule of thumb is to increase your heap
– Rule out having an undersized heap
• Second rule of thumb is to fire up the Memory profiler
– Visual VM will do in most cases
32Thursday, 2 May 13
A Likely Memory Leak
• Only 1000 seconds, look at number of Full GC's, highly
indicative. Note trend along the bottom.
33Thursday, 2 May 13
A Possible Memory Leak - I
• Note: trend along the bottom, slow leak possible. Look for
cycles in the log e.g. A full day in an application's life.
34Thursday, 2 May 13
A Possible Memory Leak - II
• Note: Trend along the bottom, slow leak possible. Again,
look for cycles in the log.
35Thursday, 2 May 13
Using a Memory Profiler
• Visual VM
– Memory profiler - invasive and slow on large apps
– Look at object ages (aka Generations)
• Look for high number of generations
– They're a candidate
– Make sure you switch on record allocation stack traces
• Use allocation stack trace to find root cause
– Track back from core JRE classes to your code
– Yes, it's always your code that's the problem!
• Can also try jmap -histo
36Thursday, 2 May 13
Visual VM - Memory Profiler
• Note: Objects in many generations! Indicative they're leaking
37Thursday, 2 May 13
Visual VM - Stack Trace
• NThreadedManagedCache$ProduceKey.run() root cause
38Thursday, 2 May 13
Long Pause Times
• The #1 complaint relating to GC
– Lots of ways to mitigate
– From small tuning tweaks --> off Heap solutions
• User reports paused/locked application!
– e.g. Web pages taking ages to load
– e.g. Progress bars stalling
• Tech Support want to uninstall Java!
39Thursday, 2 May 13
Long Pause Time Example
• User has set heap to: -Xms5G -Xmx5G
• NOTE: Resident Set Size ~1GB
40Thursday, 2 May 13
Long Pause Time Example
• ~125ms young gen pauses & ~500ms Full GC pauses
– OK for web app, but this is a new prototype low latency trading app or
Media Streaming app or Advertising service, oh dear!
41Thursday, 2 May 13
Long Pause Time partial fix
• Reduce heap size -Xmx1500M, more frequent, shorter pauses
42Thursday, 2 May 13
Long Pause Time partial fix
• ~20ms young gen pauses & ~250ms Full GC pauses, Better!
43Thursday, 2 May 13
Long Pause Time 'fixed'
• Move to a CMS collector, hopefully shorter pauses
• No Full GC's! Therefore minimal Tenured pauses
44Thursday, 2 May 13
Long Pause Time 'fixed!'
• ~10ms young gen pauses, ~2ms tenured pauses, Better!
• BUT: Throughput decreased from 69% down to 49% :-(
45Thursday, 2 May 13
Other Long Pause Time Solutions
• Increase number of threads performing GC
– -XX:ParallelGCThreads=N
– Rule of thumb is to use 3/4 the available physical cores
– Can reduce application throughput - can be bad
– Can increase context switching - bad
• Try an alternative collector
– ParNew/CMS vs PSScavenge/ParOld vs iCMS vs G1 etc
– Match the collector to your application and hardware
• Special note on G1
– You can set pause time goals
– BUT: We haven't reliably succeed for <100ms pause times
46Thursday, 2 May 13
Extreme Long Pause Time Solutions
• Azul's Zing JVM
– This has a proven low pause time goal settings
– JCK/TCK compliant
– Typically needs a very large heap (15GB+)
• Take memory off heap
– Good for caches in particular
• GC in offline mode
– Cluster app and offline nodes in order to run GC on them
47Thursday, 2 May 13
Premature Promotion
• User reports more pauses and/or longer pauses
• Tech support reports there are more full GC's
• Objects are promoted to Tenured too early
– Recall the Young Generational Hypothesis!
– This causes more Old Gen collections
• Which can lead to more Full GCs
48Thursday, 2 May 13
Premature Promotion Example
Customer had set:
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+PrintGCDetails
-Xloggc:gc.log
-Xmx1024m
-XX:+PrintTenuringDistribution
-XX:NewRatio=2
-XX:MaxTenuringThreshold=4
NewRatio=2 means young gen gets ~1/3 of the total heap
49Thursday, 2 May 13
Premature Promotion Example
• Note: ~26% of objects promoted at age 1
50Thursday, 2 May 13
Premature Promotion 'Fixed'
• We dropped the NewRatio=1, Premature Promotion ~4%
– Young Generational Hypothesis is a better fit
– This gives the Young Gen ~1/2 the heap
51Thursday, 2 May 13
System GC's
• User reports frequent pauses
– System GC's are Full GCs!
• Tech support reports there are more full GC's
– With this funny System wording in the log
• System GCs often interfere with the GC subsystem
– JVM no longer resizes heap based on runtime info
• Caused by System.gc() in code or an RMI call
– Very occasionally used to solve a problem
– System.gc() is almost always honoured
– You can disable it -XX:+DisableExplicitGC
52Thursday, 2 May 13
System GC example
• NOTE: 34,000 system GC's, every 1/2 second
– Throughput 51% - Unhappy Minecraft players!
53Thursday, 2 May 13
System GC calls 'Fixed'
• -XX:+DisableExplicitGC
• Throughput went to 99.8% - Happier Minecraft players
54Thursday, 2 May 13
Low Throughput
• User reports slow application
– e.g. Batch job fails to complete on time
• Tech support reports there are lots of GC's
• Lots of small GC's can also be bad!
– Your application threads aren't able to allocate objects
– i.e. Low Throughput
• Throughput increases when system is quiet
– Be careful in analysing the right period of activity
55Thursday, 2 May 13
Low Throughput example 1/4
• 61 seconds in total pause time, log is only 170 seconds long
• Throughput is 64% --> Rule of thumb, should be 95%+
56Thursday, 2 May 13
Low Throughput example 2/4
• Lots of small pauses from various collectors, which ones?
57Thursday, 2 May 13
Low Throughput example 3/4
• ~25% time spent in young GC & ~5-10% in Full GCs (CMFs)
58Thursday, 2 May 13
Low Throughput example 4/4
• Object allocation hitting max heap size
– Able to recover memory, so no leak, needs a bigger heap!
59Thursday, 2 May 13
Low Throughput 'Fixed' 1/4
• Increased footprint to -Xmx1024M
60Thursday, 2 May 13
Low Throughput 'Fixed' 2/4
• Lots less pauses from Full GCs CMF's - just looks nicer!
– Still lots of young gen pauses
61Thursday, 2 May 13
Low Throughput 'Fixed' 3/4
• ~15% time spent in young GC & ~0% in Full GCs
62Thursday, 2 May 13
Low Throughput 'Fixed' 4/4
• Note: 33 seconds out of 170, ~81% Throughput, Better!
63Thursday, 2 May 13
Low Throughput 'Really Fixed' 1/2
• Switched to PSYoungGen collector (from ParNew)
– Worth trying as young gen collections are dominant
64Thursday, 2 May 13
Low Throughput 'Really Fixed' 2/2
• Note: 9 seconds out of 170, ~95% Throughput, Best!
65Thursday, 2 May 13
Healthy Application
• What is healthy? It depends!
• Throughput
– Typically a 95%+ throughput is good
• Pause times
– < 1sec is good for generic web apps
• Footprint
– Smaller == Less live objects to track == Better?
66Thursday, 2 May 13
Healthy Application
• Saw tooth pattern
• Bottom of troughs trend line is flat
67Thursday, 2 May 13
Healthy Minecraft Client!
• Note: JVM resizing itself, you let IT do the work!
68Thursday, 2 May 13
Maxed Allocation Rate
• User reports slow application behaviour
• Tech support has no idea why!
– Normally you'd do a full performance diagnostic
– But we can look at GC cheaply
• GC logs can help with non GC problems!
– Memory Bandwidth limits are being hit
– Not a GC problem!
• More common in virtualised environments
– What else on the hardware is using bandwidth?
69Thursday, 2 May 13
Not Maxed Allocation Rate Example
70Thursday, 2 May 13
Max Allocation Rate Example
• 8GB/sec - could be getting close to real memory bandwidth
71Thursday, 2 May 13
Max Allocation Rate Example
Hard limit at ~8GB (8e+06 on graph)
72Thursday, 2 May 13
Max Allocation Rate 'Fixes'
• Lots you can do!
• Stop allocating so much!
– Get out your Memory profiler
– Alter the applications's objection allocation behaviour
• Get better hardware!
– CPU
– Faster Bus
– Faster RAM
• Don't virtualise/share
– Have your application be the only thing on that hardware
73Thursday, 2 May 13
Summary
• You need to understand some basic GC theory
– Work with the Weak Generational Hypothesis
– See http://www.insightfullogic.com for blog posts
• Turn on GC logging!
– It has low overhead*
– Reading raw log files is hard
– Use tooling!
• Tradeoff: Pause Times vs Throughput vs Heap Size
– Use tools to help you tweak
– "Empirical Science Wins!"
74Thursday, 2 May 13
Join our performance community
http://www.jclarity.com
Martijn Verburg (@karianna)
Dr. John Oliver (@johno_oliver)
75Thursday, 2 May 13

More Related Content

What's hot

Lightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesLightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesPGConf APAC
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Brian Brazil
 
Run MongoDB with Confidence Using MongoDB Management Service (MMS)
Run MongoDB with Confidence Using MongoDB Management Service (MMS)Run MongoDB with Confidence Using MongoDB Management Service (MMS)
Run MongoDB with Confidence Using MongoDB Management Service (MMS)MongoDB
 
Run MongoDB with Confidence: Backing up and Monitoring with MMS
Run MongoDB with Confidence: Backing up and Monitoring with MMSRun MongoDB with Confidence: Backing up and Monitoring with MMS
Run MongoDB with Confidence: Backing up and Monitoring with MMSMongoDB
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy Docker, Inc.
 
Scalabe MySQL Infrastructure
Scalabe MySQL InfrastructureScalabe MySQL Infrastructure
Scalabe MySQL InfrastructureBalazs Pocze
 
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...Brian Brazil
 
Bosun Monitoring Talk at LISA14
Bosun Monitoring Talk at LISA14Bosun Monitoring Talk at LISA14
Bosun Monitoring Talk at LISA14Kyle Brandt
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
 
Overview of Postgres Utility Processes
Overview of Postgres Utility ProcessesOverview of Postgres Utility Processes
Overview of Postgres Utility ProcessesEDB
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareIndicThreads
 
Spark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New YorkSpark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New YorkHolden Karau
 
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...PostgreSQL-Consulting
 
Lessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’tLessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’tPGConf APAC
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDenish Patel
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Demi Ben-Ari
 
How a Small Team Scales Instagram
How a Small Team Scales InstagramHow a Small Team Scales Instagram
How a Small Team Scales InstagramC4Media
 
Tulsa tech fest 2010 - web speed and scalability
Tulsa tech fest 2010  - web speed and scalabilityTulsa tech fest 2010  - web speed and scalability
Tulsa tech fest 2010 - web speed and scalabilityJason Ragsdale
 
Евгений Хыст "Application performance database related problems"
Евгений Хыст "Application performance database related problems"Евгений Хыст "Application performance database related problems"
Евгений Хыст "Application performance database related problems"Anna Shymchenko
 

What's hot (20)

Lightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesLightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst Practices
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
 
Run MongoDB with Confidence Using MongoDB Management Service (MMS)
Run MongoDB with Confidence Using MongoDB Management Service (MMS)Run MongoDB with Confidence Using MongoDB Management Service (MMS)
Run MongoDB with Confidence Using MongoDB Management Service (MMS)
 
Run MongoDB with Confidence: Backing up and Monitoring with MMS
Run MongoDB with Confidence: Backing up and Monitoring with MMSRun MongoDB with Confidence: Backing up and Monitoring with MMS
Run MongoDB with Confidence: Backing up and Monitoring with MMS
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 
Scalabe MySQL Infrastructure
Scalabe MySQL InfrastructureScalabe MySQL Infrastructure
Scalabe MySQL Infrastructure
 
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
 
Bosun Monitoring Talk at LISA14
Bosun Monitoring Talk at LISA14Bosun Monitoring Talk at LISA14
Bosun Monitoring Talk at LISA14
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Overview of Postgres Utility Processes
Overview of Postgres Utility ProcessesOverview of Postgres Utility Processes
Overview of Postgres Utility Processes
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
 
Spark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New YorkSpark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New York
 
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
 
Lessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’tLessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’t
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQL
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
How a Small Team Scales Instagram
How a Small Team Scales InstagramHow a Small Team Scales Instagram
How a Small Team Scales Instagram
 
Tulsa tech fest 2010 - web speed and scalability
Tulsa tech fest 2010  - web speed and scalabilityTulsa tech fest 2010  - web speed and scalability
Tulsa tech fest 2010 - web speed and scalability
 
Евгений Хыст "Application performance database related problems"
Евгений Хыст "Application performance database related problems"Евгений Хыст "Application performance database related problems"
Евгений Хыст "Application performance database related problems"
 

Similar to Hotspot Garbage Collection - Tuning Guide

Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101Tomer Gabel
 
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...Imperva Incapsula
 
Garbage Collection: the Useful Parts - Martijn Verburg & Dr John Oliver (jCla...
Garbage Collection: the Useful Parts - Martijn Verburg & Dr John Oliver (jCla...Garbage Collection: the Useful Parts - Martijn Verburg & Dr John Oliver (jCla...
Garbage Collection: the Useful Parts - Martijn Verburg & Dr John Oliver (jCla...jaxLondonConference
 
Java at Scale, Dallas JUG, October 2013
Java at Scale, Dallas JUG, October 2013Java at Scale, Dallas JUG, October 2013
Java at Scale, Dallas JUG, October 2013Azul Systems Inc.
 
Micrometrics to forecast performance tsunamis
Micrometrics to forecast performance tsunamisMicrometrics to forecast performance tsunamis
Micrometrics to forecast performance tsunamisTier1app
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2PgTraining
 
MIS: Computers, Dr. Ashish K. Gupta
MIS: Computers, Dr. Ashish K. GuptaMIS: Computers, Dr. Ashish K. Gupta
MIS: Computers, Dr. Ashish K. GuptaAshish K Gupta
 
Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Jon Haddad
 
SharkFest16_Palm_Online
SharkFest16_Palm_OnlineSharkFest16_Palm_Online
SharkFest16_Palm_OnlineBrad Palm
 
GemStone/S Update
GemStone/S UpdateGemStone/S Update
GemStone/S UpdateESUG
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Stop Feeding IBM i Performance Hogs - Robot
Stop Feeding IBM i Performance Hogs - RobotStop Feeding IBM i Performance Hogs - Robot
Stop Feeding IBM i Performance Hogs - RobotHelpSystems
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...srisatish ambati
 
this-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptxthis-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptxTier1 app
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionDataStax Academy
 
@SIMUL8 Virtual User Group, September: Brian Harrington, Less is More
@SIMUL8 Virtual User Group, September: Brian Harrington, Less is More@SIMUL8 Virtual User Group, September: Brian Harrington, Less is More
@SIMUL8 Virtual User Group, September: Brian Harrington, Less is MoreSIMUL8 Corporation
 
Death of Disk Panel Session - HEC-FSIO Workshop
Death of Disk Panel Session - HEC-FSIO WorkshopDeath of Disk Panel Session - HEC-FSIO Workshop
Death of Disk Panel Session - HEC-FSIO WorkshopErik Riedel
 

Similar to Hotspot Garbage Collection - Tuning Guide (20)

Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101
 
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
 
Garbage Collection: the Useful Parts - Martijn Verburg & Dr John Oliver (jCla...
Garbage Collection: the Useful Parts - Martijn Verburg & Dr John Oliver (jCla...Garbage Collection: the Useful Parts - Martijn Verburg & Dr John Oliver (jCla...
Garbage Collection: the Useful Parts - Martijn Verburg & Dr John Oliver (jCla...
 
Java at Scale, Dallas JUG, October 2013
Java at Scale, Dallas JUG, October 2013Java at Scale, Dallas JUG, October 2013
Java at Scale, Dallas JUG, October 2013
 
Micrometrics to forecast performance tsunamis
Micrometrics to forecast performance tsunamisMicrometrics to forecast performance tsunamis
Micrometrics to forecast performance tsunamis
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
 
MIS: Computers, Dr. Ashish K. Gupta
MIS: Computers, Dr. Ashish K. GuptaMIS: Computers, Dr. Ashish K. Gupta
MIS: Computers, Dr. Ashish K. Gupta
 
Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014
 
SharkFest16_Palm_Online
SharkFest16_Palm_OnlineSharkFest16_Palm_Online
SharkFest16_Palm_Online
 
GemStone/S Update
GemStone/S UpdateGemStone/S Update
GemStone/S Update
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Stop Feeding IBM i Performance Hogs - Robot
Stop Feeding IBM i Performance Hogs - RobotStop Feeding IBM i Performance Hogs - Robot
Stop Feeding IBM i Performance Hogs - Robot
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
 
this-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptxthis-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptx
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
 
@SIMUL8 Virtual User Group, September: Brian Harrington, Less is More
@SIMUL8 Virtual User Group, September: Brian Harrington, Less is More@SIMUL8 Virtual User Group, September: Brian Harrington, Less is More
@SIMUL8 Virtual User Group, September: Brian Harrington, Less is More
 
Death of Disk Panel Session - HEC-FSIO Workshop
Death of Disk Panel Session - HEC-FSIO WorkshopDeath of Disk Panel Session - HEC-FSIO Workshop
Death of Disk Panel Session - HEC-FSIO Workshop
 

More from jClarity

The Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningThe Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningjClarity
 
The Diabolical Developer's Guide to Surviving Java 9
The Diabolical Developer's Guide to Surviving Java 9The Diabolical Developer's Guide to Surviving Java 9
The Diabolical Developer's Guide to Surviving Java 9jClarity
 
Low pause GC in HotSpot
Low pause GC in HotSpotLow pause GC in HotSpot
Low pause GC in HotSpotjClarity
 
Habits of Highly Effective Teams
Habits of Highly Effective TeamsHabits of Highly Effective Teams
Habits of Highly Effective TeamsjClarity
 
The Bleeding Edge
The Bleeding EdgeThe Bleeding Edge
The Bleeding EdgejClarity
 
Raising The Bar
Raising The BarRaising The Bar
Raising The BarjClarity
 

More from jClarity (6)

The Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningThe Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance Tuning
 
The Diabolical Developer's Guide to Surviving Java 9
The Diabolical Developer's Guide to Surviving Java 9The Diabolical Developer's Guide to Surviving Java 9
The Diabolical Developer's Guide to Surviving Java 9
 
Low pause GC in HotSpot
Low pause GC in HotSpotLow pause GC in HotSpot
Low pause GC in HotSpot
 
Habits of Highly Effective Teams
Habits of Highly Effective TeamsHabits of Highly Effective Teams
Habits of Highly Effective Teams
 
The Bleeding Edge
The Bleeding EdgeThe Bleeding Edge
The Bleeding Edge
 
Raising The Bar
Raising The BarRaising The Bar
Raising The Bar
 

Recently uploaded

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Hotspot Garbage Collection - Tuning Guide

  • 1. Hotspot Garbage Collection Tuning Guide http://www.jclarity.com 1Thursday, 2 May 13
  • 2. Who are we? • Martijn Verburg (@karianna) – CTO of jClarity – aka "The Diabolical Developer" – co-leader of the LJC • Dr. John Oliver (@johno_oliver) – Research Mentat at jClarity • Strange title? Yes we're a start-up – Can read raw GC log files • "Empirical Science Wins" 2Thursday, 2 May 13
  • 3. What we're going to cover • Part I - Shining a light into the Darkness – Retrospective from Talk I – Collector Flags Ahoy – Tooling and Basic Data • Part II - Setting the stage – When to tune GC – Pause times vs Throughput vs Heap Size – Application Lifecycle • Part III - Real World Scenarios – Possible Memory Leak(s), Long Pause Times – Premature Promotion, System GCs, Low Throughput – Healthy Application, Maxed Allocation Rate 3Thursday, 2 May 13
  • 4. What we're not covering • G1 Collector – It's supported in production now – But we doubt any of you are using it yet • Non Hotspot JVMs – Again, most of you are using OpenJDK/Oracle. – Azul's Zing VM is a specialist VM you can look at 4Thursday, 2 May 13
  • 5. Part I - Shining a light into the dark • Retrospective • Collector Flags ahoy • Reading CMS Log records • Tooling and basic data 5Thursday, 2 May 13
  • 6. Java Heap Layout Copyright - Oracle Corporation 6Thursday, 2 May 13
  • 7. Weak Generational Hypothesis Copyright - Oracle Corporation 7Thursday, 2 May 13
  • 8. Copy Collectors • aka "stop-and-copy" – Some literature will discuss "Cheney's algorithm" • Used in many managed runtimes – Including Hotspot • GC thread(s) trace from root(s) to find live objects • Typically involves copying live objects – From one space to another space in memory – The result typically looks like a move as opposed to a copy 8Thursday, 2 May 13
  • 9. Mark and Sweep Collectors • Used by many modern collectors – Including Hotspot, usually for old generational collection • Typically 2 mandatory and 1 optional step(s) 1.Find live objects (mark) 2.'Delete' dead objects (sweep) 3.Tidy up - optional (compact) 9Thursday, 2 May 13
  • 10. More Flags than your Deity Copyright Frank Pavageau 10Thursday, 2 May 13
  • 11. 'Mandatory' Flags • -Xloggc:<pathtofile> – Path to the log output, make sure you've got disk space! • -XX:+PrintGCDetails – Minimum information for tools to help – Replace -verbose:gc with this • -XX:+PrintTenuringDistribution – Premature promotion information 11Thursday, 2 May 13
  • 12. Basic Heap Sizing Flags • -Xms<size> – Set the minimum size reserved for the heap • -Xmx<size> – Set the maximum size reserved for the heap • -XX:MaxPermSize=<size> – Set the maximum size of your perm gen – Good for Spring apps and App servers • We'll cover other flags in a tuning context 12Thursday, 2 May 13
  • 13. Beware of Magic Happening • When you touch GC Flags a Puppy dies • Your Tenuring Threshold jumps to 15 • -XX:MaxTenuringThreshold=n – To reset this to what you really want 13Thursday, 2 May 13
  • 14. Tooling • HPJMeter (Google it) – Solid, but no longer supported / enhanced • GCViewer (http://www.tagtraum.com/gcviewer.html) – Has rudimentary G1 support • GarbageCat (http://code.google.com/a/eclipselabs.org/p/garbagecat/) – Best name • IBM GCMV (http://www.ibm.com/developerworks/java/jdk/tools/gcmv/) – J9 support • jClarity Censum (http://www.jclarity.com/products/censum) – The prettiest and most useful, but we're biased! 14Thursday, 2 May 13
  • 15. Don't listen to the vendors ;-) • Single log with consistent format? – You can probably grep for stuff – This doesn't scale • Existing free tools are adequate* – *For older JVMs especially – Most are no longer actively maintained • Latest tooling does more for you – Supports Latest JVMs & Collectors – Has more meaningful visualisations – Starts to do some of the Human analysis for you – Correlates and performs historical analysis – Parses certain data out that the others don't 15Thursday, 2 May 13
  • 17. Heap Usage After GC 17Thursday, 2 May 13
  • 23. Part II - Setting the stage • When to Tune • Latency / Throughput / Footprint – aka Performance goals • Application Lifecycle • Know your Hardware 23Thursday, 2 May 13
  • 24. When to tune GC • As part of a performance diagnostic process – After looking machine metrics – Before execution profiler • It's cheap to switch on GC flags – It's cheap to eliminate or pin issue on GC – It's not cheap to setup execution profilers • Result is either "GC is OK" or "GC is not OK" – Tune the GC and/or – Bring out the memory profiler 24Thursday, 2 May 13
  • 25. Latency vs Throughput vs Footprint • aka performance goals: – e.g. "Max Pause Times / 95th% Pause Times" vs – "Object Allocation Rate" vs – "Heap Size" – Throughput ~= % of time doing application work • Tuning tradeoff – Latency x Throughput x Footprint = Z – You can typically tune for 2/3 of these – To increase Z you need to • increase allocated hardware OR • Rewrite your app • Decide what characteristics you want! – Before tuning 25Thursday, 2 May 13
  • 26. Latency vs Throughput vs Footprint • Better Throughput – Usually means worse Latency and Footprint • Better Latency – Usually means worse Throughput • Better footprint – Usually means worse Throughput 26Thursday, 2 May 13
  • 27. Application Lifecycle • Very little point in tuning based off limited information – Have you gathered enough data – Has your application gone through it's typical lifecycle? – This is why we don't run 'Live Demos' • Very little point in tuning off incorrect information – Application start-up, shutdown and batch jobs are all outliers • You can infer amazing things from GC logs – When Richard went to lunch – When John stopped playing Minecraft – When Ben kicked off the weekly customer report – ..... 27Thursday, 2 May 13
  • 28. Know your Hardware • Number of CPU cores, matters – Allocate X threads to do GC work with a concurrent collector – How many is 'safe'? – How does that affect throughput? • Memory Bandwidth, matters – How quickly can your hardware allocate? – See your manufacturer – Object Allocation Rates != Memory Bandwidth != Real Metric • Use Hawkshaw to explore your hardware – Produces GC behaviour according to statistical models – http://www.github.com/jclarity/hawkshaw 28Thursday, 2 May 13
  • 29. Part III - Tuning Scenarios • Tuning can make it worse! • Grain of Salt • Scenarios – Possible Memory Leak(s) – Long Pause Times – Premature Promotion – System GCs – Low Throughput – Healthy Application – Maxed Allocation Rate 29Thursday, 2 May 13
  • 30. Tuning can make it worse* • Performance Tuning is an iterative process – Sometimes solving one problem uncovers a 2nd worse problem – e.g. Fix the app, then the database gets hammered • Overall performance goes down • Only fix one aspect of GC at a time – Measure the next cycle with fresh eyes – Have you met your goals or made them worse? • GC tuning still needs human interaction – Azul's Zing can/will claim otherwise. 30Thursday, 2 May 13
  • 31. Grain of Salt "Nothing that we say should be held as performance tuning tips for *your* application" "There is *always* more than one way to tune in order to meet your goal" "Don't just use our numbers!" 31Thursday, 2 May 13
  • 32. A Likely Memory Leak • Memory leaks can't truly be ascertained by a GC log – It could just be an undersized heap! – Needs Human domain knowledge of app (periodicity) • First rule of thumb is to increase your heap – Rule out having an undersized heap • Second rule of thumb is to fire up the Memory profiler – Visual VM will do in most cases 32Thursday, 2 May 13
  • 33. A Likely Memory Leak • Only 1000 seconds, look at number of Full GC's, highly indicative. Note trend along the bottom. 33Thursday, 2 May 13
  • 34. A Possible Memory Leak - I • Note: trend along the bottom, slow leak possible. Look for cycles in the log e.g. A full day in an application's life. 34Thursday, 2 May 13
  • 35. A Possible Memory Leak - II • Note: Trend along the bottom, slow leak possible. Again, look for cycles in the log. 35Thursday, 2 May 13
  • 36. Using a Memory Profiler • Visual VM – Memory profiler - invasive and slow on large apps – Look at object ages (aka Generations) • Look for high number of generations – They're a candidate – Make sure you switch on record allocation stack traces • Use allocation stack trace to find root cause – Track back from core JRE classes to your code – Yes, it's always your code that's the problem! • Can also try jmap -histo 36Thursday, 2 May 13
  • 37. Visual VM - Memory Profiler • Note: Objects in many generations! Indicative they're leaking 37Thursday, 2 May 13
  • 38. Visual VM - Stack Trace • NThreadedManagedCache$ProduceKey.run() root cause 38Thursday, 2 May 13
  • 39. Long Pause Times • The #1 complaint relating to GC – Lots of ways to mitigate – From small tuning tweaks --> off Heap solutions • User reports paused/locked application! – e.g. Web pages taking ages to load – e.g. Progress bars stalling • Tech Support want to uninstall Java! 39Thursday, 2 May 13
  • 40. Long Pause Time Example • User has set heap to: -Xms5G -Xmx5G • NOTE: Resident Set Size ~1GB 40Thursday, 2 May 13
  • 41. Long Pause Time Example • ~125ms young gen pauses & ~500ms Full GC pauses – OK for web app, but this is a new prototype low latency trading app or Media Streaming app or Advertising service, oh dear! 41Thursday, 2 May 13
  • 42. Long Pause Time partial fix • Reduce heap size -Xmx1500M, more frequent, shorter pauses 42Thursday, 2 May 13
  • 43. Long Pause Time partial fix • ~20ms young gen pauses & ~250ms Full GC pauses, Better! 43Thursday, 2 May 13
  • 44. Long Pause Time 'fixed' • Move to a CMS collector, hopefully shorter pauses • No Full GC's! Therefore minimal Tenured pauses 44Thursday, 2 May 13
  • 45. Long Pause Time 'fixed!' • ~10ms young gen pauses, ~2ms tenured pauses, Better! • BUT: Throughput decreased from 69% down to 49% :-( 45Thursday, 2 May 13
  • 46. Other Long Pause Time Solutions • Increase number of threads performing GC – -XX:ParallelGCThreads=N – Rule of thumb is to use 3/4 the available physical cores – Can reduce application throughput - can be bad – Can increase context switching - bad • Try an alternative collector – ParNew/CMS vs PSScavenge/ParOld vs iCMS vs G1 etc – Match the collector to your application and hardware • Special note on G1 – You can set pause time goals – BUT: We haven't reliably succeed for <100ms pause times 46Thursday, 2 May 13
  • 47. Extreme Long Pause Time Solutions • Azul's Zing JVM – This has a proven low pause time goal settings – JCK/TCK compliant – Typically needs a very large heap (15GB+) • Take memory off heap – Good for caches in particular • GC in offline mode – Cluster app and offline nodes in order to run GC on them 47Thursday, 2 May 13
  • 48. Premature Promotion • User reports more pauses and/or longer pauses • Tech support reports there are more full GC's • Objects are promoted to Tenured too early – Recall the Young Generational Hypothesis! – This causes more Old Gen collections • Which can lead to more Full GCs 48Thursday, 2 May 13
  • 49. Premature Promotion Example Customer had set: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+PrintGCDetails -Xloggc:gc.log -Xmx1024m -XX:+PrintTenuringDistribution -XX:NewRatio=2 -XX:MaxTenuringThreshold=4 NewRatio=2 means young gen gets ~1/3 of the total heap 49Thursday, 2 May 13
  • 50. Premature Promotion Example • Note: ~26% of objects promoted at age 1 50Thursday, 2 May 13
  • 51. Premature Promotion 'Fixed' • We dropped the NewRatio=1, Premature Promotion ~4% – Young Generational Hypothesis is a better fit – This gives the Young Gen ~1/2 the heap 51Thursday, 2 May 13
  • 52. System GC's • User reports frequent pauses – System GC's are Full GCs! • Tech support reports there are more full GC's – With this funny System wording in the log • System GCs often interfere with the GC subsystem – JVM no longer resizes heap based on runtime info • Caused by System.gc() in code or an RMI call – Very occasionally used to solve a problem – System.gc() is almost always honoured – You can disable it -XX:+DisableExplicitGC 52Thursday, 2 May 13
  • 53. System GC example • NOTE: 34,000 system GC's, every 1/2 second – Throughput 51% - Unhappy Minecraft players! 53Thursday, 2 May 13
  • 54. System GC calls 'Fixed' • -XX:+DisableExplicitGC • Throughput went to 99.8% - Happier Minecraft players 54Thursday, 2 May 13
  • 55. Low Throughput • User reports slow application – e.g. Batch job fails to complete on time • Tech support reports there are lots of GC's • Lots of small GC's can also be bad! – Your application threads aren't able to allocate objects – i.e. Low Throughput • Throughput increases when system is quiet – Be careful in analysing the right period of activity 55Thursday, 2 May 13
  • 56. Low Throughput example 1/4 • 61 seconds in total pause time, log is only 170 seconds long • Throughput is 64% --> Rule of thumb, should be 95%+ 56Thursday, 2 May 13
  • 57. Low Throughput example 2/4 • Lots of small pauses from various collectors, which ones? 57Thursday, 2 May 13
  • 58. Low Throughput example 3/4 • ~25% time spent in young GC & ~5-10% in Full GCs (CMFs) 58Thursday, 2 May 13
  • 59. Low Throughput example 4/4 • Object allocation hitting max heap size – Able to recover memory, so no leak, needs a bigger heap! 59Thursday, 2 May 13
  • 60. Low Throughput 'Fixed' 1/4 • Increased footprint to -Xmx1024M 60Thursday, 2 May 13
  • 61. Low Throughput 'Fixed' 2/4 • Lots less pauses from Full GCs CMF's - just looks nicer! – Still lots of young gen pauses 61Thursday, 2 May 13
  • 62. Low Throughput 'Fixed' 3/4 • ~15% time spent in young GC & ~0% in Full GCs 62Thursday, 2 May 13
  • 63. Low Throughput 'Fixed' 4/4 • Note: 33 seconds out of 170, ~81% Throughput, Better! 63Thursday, 2 May 13
  • 64. Low Throughput 'Really Fixed' 1/2 • Switched to PSYoungGen collector (from ParNew) – Worth trying as young gen collections are dominant 64Thursday, 2 May 13
  • 65. Low Throughput 'Really Fixed' 2/2 • Note: 9 seconds out of 170, ~95% Throughput, Best! 65Thursday, 2 May 13
  • 66. Healthy Application • What is healthy? It depends! • Throughput – Typically a 95%+ throughput is good • Pause times – < 1sec is good for generic web apps • Footprint – Smaller == Less live objects to track == Better? 66Thursday, 2 May 13
  • 67. Healthy Application • Saw tooth pattern • Bottom of troughs trend line is flat 67Thursday, 2 May 13
  • 68. Healthy Minecraft Client! • Note: JVM resizing itself, you let IT do the work! 68Thursday, 2 May 13
  • 69. Maxed Allocation Rate • User reports slow application behaviour • Tech support has no idea why! – Normally you'd do a full performance diagnostic – But we can look at GC cheaply • GC logs can help with non GC problems! – Memory Bandwidth limits are being hit – Not a GC problem! • More common in virtualised environments – What else on the hardware is using bandwidth? 69Thursday, 2 May 13
  • 70. Not Maxed Allocation Rate Example 70Thursday, 2 May 13
  • 71. Max Allocation Rate Example • 8GB/sec - could be getting close to real memory bandwidth 71Thursday, 2 May 13
  • 72. Max Allocation Rate Example Hard limit at ~8GB (8e+06 on graph) 72Thursday, 2 May 13
  • 73. Max Allocation Rate 'Fixes' • Lots you can do! • Stop allocating so much! – Get out your Memory profiler – Alter the applications's objection allocation behaviour • Get better hardware! – CPU – Faster Bus – Faster RAM • Don't virtualise/share – Have your application be the only thing on that hardware 73Thursday, 2 May 13
  • 74. Summary • You need to understand some basic GC theory – Work with the Weak Generational Hypothesis – See http://www.insightfullogic.com for blog posts • Turn on GC logging! – It has low overhead* – Reading raw log files is hard – Use tooling! • Tradeoff: Pause Times vs Throughput vs Heap Size – Use tools to help you tweak – "Empirical Science Wins!" 74Thursday, 2 May 13
  • 75. Join our performance community http://www.jclarity.com Martijn Verburg (@karianna) Dr. John Oliver (@johno_oliver) 75Thursday, 2 May 13