SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Top 10 Issues for
Java in Production
SriSatish Ambati
Cliff Click Jr.
Azul Systems, Inc
A Decade of
Java in Production
• A lot of hard-earned wisdom
• A lot of victories (quickly forgotten)
• A lot of endless pain points
• Usually the Pain Point is really
A Systems Issue
• It's Not Just the JVM (nor network, nor ...)
Tools of the Trade
• What the JVM is doing:
– dtrace, hprof, introscope, jconsole, visualvm,
yourkit, azul zvision
• Invasive JVM observation tools:
– bci, jvmti, jvmdi/pi agents, logging
• What the OS is doing:
– dtrace, oprofile, vtune
• What the network/disk is doing:
– ganglia, iostat, lsof, nagios, netstat
Tools of the Trade
• Symptom
– Production monitoring can be very expensive
Staging environment does not repro issues
– Instrumented code changes cache profile
– MBeans are not cheap either!
• Solutions
– Pick the right axe for the problem!
– Avoid expensive heap walks
– Finish task then increment perf counters
– Asynchronous logging, jconsole, azul zvision
10 - Instrumentation is
Not Cheap
9 - Leaks
• Symptom
– App consumes all the memory you got
– Live Heap trend is a ramping sawtooth
– Then slows, then throws OutOfMemory
• Tools
– yourkit, hprof, eclipse mat, jconsole,
jhat, jps, visualvm, azul zvision
• Theory
– Allocated vs Live Objects, vm memory, Perm Gen
– Finalizers, ClassLoaders, ThreadLocal
Leaks: jconsole
• Tomcat + ActiveMQ
– 1 week in production
– after 9hrs in test
– Leaks 100MB/hr
Leaks: Visual VM
14741 classes loaded
ClassLoader leak,
PermGen full
9 – Leaks: Bloat
• Cost of an 8-char String?
• Cost of 100-entry TreeMap<Double,Double> ?
8b
hdr
12b
fields
4b
ptr
4b
pad
8b
hdr
4b
len
16b
data
A: 56 bytes, or a 7x blowup
48b
TreeMap
40b
TreeMap$Entry
100
16b
Double
16b
Double
100 100
A: 7248 bytes or a ~5x blowup
Class name Size (B) Count Avg (B)
Total 21,580,592 228,805 94.3
char[] 4,215,784 48,574 86.8
byte[] 3,683,984 5,024 733.3
Built-in VM methodKlass 2,493,064 16,355 152.4
Built-in VM constMethodKlass 1,955,696 16,355 119.6
Built-in VM constantPoolKlass 1,437,240 1,284 1,119.30
Built-in VM instanceKlass 1,078,664 1,284 840.1
java.lang.Class[] 922,808 45,354 20.3
Built-in VM constantPoolCacheKlass 903,360 1,132 798
java.lang.String 753,936 31,414 24
java.lang.Object[] 702,264 8,118 86.5
java.lang.reflect.Method 310,752 2,158 144
short[] 261,112 3,507 74.5
java.lang.Class 255,904 1,454 176
int[][] 184,680 2,032 90.9
java.lang.String[] 173,176 1,746 99.2
java.util.zip.ZipEntry 172,080 2,390 72
Apache Tomcat 6.0
Allocated
Class name Size (B) Count Avg (B)
Total 1,410,764,512 19,830,135 71.1
char[] 423,372,528 4,770,424 88.7
byte[] 347,332,152 1,971,692 176.2
int[] 85,509,280 1,380,642 61.9
java.lang.String 73,623,024 3,067,626 24
java.lang.Object[] 64,788,840 565,693 114.5
java.util.regex.Matcher 51,448,320 643,104 80
java.lang.reflect.Method 43,374,528 301,212 144
java.util.HashMap$Entry[] 27,876,848 140,898 197.9
java.util.TreeMap$Entry 22,116,136 394,931 56
java.util.HashMap$Entry 19,806,440 495,161 40
java.nio.HeapByteBuffer 17,582,928 366,311 48
java.nio.HeapCharBuffer 17,575,296 366,152 48
java.lang.StringBuilder 15,322,128 638,422 24
java.util.TreeMap$EntryIterator 15,056,784 313,683 48
java.util.ArrayList 11,577,480 289,437 40
java.util.HashMap 7,829,056 122,329 64
java.util.TreeMap 7,754,688 107,704 72
Million Objects
allocated live
JBoss 5.1 20 4
Apache Tomcat 6.0 0.25 0.1
Live
JBoss 5.1
Allocated
JEE is not cheap!
JBoss & Apache startup
- 20M objects before starting the app
example: yourkit
memory profiling
Know footprint: use memory profiling!
(snapshots are still expensive)
Got Leaks?
• Symptom
– Multi-node scale-out does not scale linearly
– Time in both CPU and I/O (serialization costs)
• Tools
–Cpu profiling, I/O profiling
• Solution
– All serialization libraries are not equal!
– Pick a high performance serialization library or
roll-your-own
– Avro, kryo, protocol-buffers, thrift
8 – I/O: Serialization
• Symptom
– Application hangs or remote call fails after awhile
– “Too many open File Descriptors”, “Cursors”
– Inconsistent response times
• Tools
– nagios, pkg, rpm info, ulimit, yum
• Solutions
– Check for “new” OS patches, user & process limits,
network & semaphore configurations
– Close all I/O streams
– Maybe you are I/O bound!
8 – I/O: Limits, Tuning
• Symptoms
– Socket.create/close takes too long
– JRMP timeouts, long JDBC calls
– Running out of file descriptors, cursors, disk
• Tools
– dbms tools, du, iostat, gmon, lsof, netstat
• Workaround
– Check all O/S patches, sysctl flags,
run ping/telnet test
– Check & set SO_LINGER, TCP_LINGER2
8 – I/O: Sockets, Files, DB
8 - I/O
• Symptoms
– Adding users / threads / CPUs causes app slow down
(less throughput, worse response)
– High lock acquire times & contention
– Race conditions, deadlock, I/O under lock
• Tools
– d-trace, lockstat, azul zvision
• Solution
– Use non-blocking Collections
– Striping locks, reducing hold times, no I/O
7 – Locks & synchronized
Example: IBM Visual Analyzer
(j.u.c view in eclipse)
Zillion threads acquiring same lock
j.u.c.ConcurrentLock is still a lock!
Need a non-blocking collection
(or stripe lock or lower hold times, etc)
Example: zvision
Hot lock is usually 10x to 100x
more acquire time than next lock..
Look for rapidly growing acquire times!
Example: zvision
Hot Lock Backtrace
• Symptom
– Time “compiling”
– Time in the Interpreter
• Tools
– -XX:+PrintCompilation, cpu profiler
– Find endlessly-recompiling method
• Workaround
– Exclude using .hotspot_compiler file
• Root cause: It's a JVM Bug! File a bug report!
6 – Endless Compilation
• Symptom
– Application spends time in j.l.T.fillInStackTrace()
• Tools
– Cpu profiler, azul zvision
– Thread dumps (repeated kill -3, zvision)
– Track caller/callee to find throw'r
• Not all exceptions appear in log files
• Solution
– Don't Throw, alternate return value (e.g. null)
5 – Endless Exceptions
• Related
– Exception paths are typically failure paths
– JVMs do not to optimize them much
– Often found when a server collapses
5 – Endless Exceptions
• Symptom
– Performance degrades over time
– Inducing a “Full GC” makes problem go away
– Lots of free memory but in tiny fragments
• Tools
– GC logging flags, e.g. for CMS
-XX:PrintFLSStatistics=2
-XX:+PrintCMSInitiationStatistics
4 - Fragmentation
• Tools
– “Fragger” www.azulsystems.com/resources
• Tiny cpu cost, low memory cost
• Frag's heap in 60sec like an hour in production
• Get FullGC cycles at dev's desk
• Solution
– Upgrade to latest CMS (CR:6631166)
– Azul Zing & Gen Pauseless GC
– Pooling similar sized/aged objects
• (really hard to get right!)
4 - Fragmentation
• Symptom
– Entropy(gc) == number_of_gc_flags
• Too many free parameters
• 64-bit/large heap size is not a solution
– Constant 40-60% CPU utilization by GC
– Scheduled reboot before full-GC
– Full time Engineer working GC flags;
• Workarounds
– Ask JVM Vendor to give 1 flag solution
– G1 GC, Azul’s Zing GPGC
3 – GC Tuning
-server -Xloggc:gc.log -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:MaxPermSize=128m
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:MaxNewSize=64m -XX:NewSize=64m
-Xms1536m -Xmx1536m -XX:SurvivorRatio=128
-XX:MaxTenuringThreshold=0
-XX:CMSInitiatingOccupancyFraction=60
-Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFF
FFFFFE
-Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFF
FFFFFE
3 – GC Tuning
Oracle Weblogic GC Flags
2 - Spikes
• Symptoms
– Rush hour traffic, tax day, Black Friday
– Outages under spikes, power law of networks
• Solution
– Measure.
– Test with realistic load & realistic multi-node setup
– Build redundancy & elasticity into infrastructure
– Don’t ignore Exceptions & retries under load
Busiest online day is...
• Symptom
– Different nodes have different configurations,
different stack components, versions
– classpath has dist/*, -verbose:class
– subtle hard to reproduce issues
• Solution
– Method. Version Control.
– Good ol’ fashioned rigor
1 – Versionitis
When ears wage class wars with jars
“It can only be attributable to human error” - HAL
• Runs fine as load Ramps Up
– At peak load, system is unstable
– Slightly above peak: Collapse!
• Heavy load triggers exception (e.g. timeout)
• Exception path is slow already (e.g. logging)
• Transaction retried (so more work yet again)
• So NEXT transaction times-out
• Soon all time spent throwing & logging exceptions
• No forward progress
0 – Collapse Under Load
(pick any 3 above!)
example:
Driving into San Francisco
References:
Java.util.concurrent lock profiling
http://infoq.com/jucprofiler
Java serialization benchmarks
http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2
Memory profiling with yourkit
http://yourkit.com
Tuning gc
http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html
http://blog.codecentric.de/en/2010/01/
java-outofmemoryerror-a-tragedy-in-seven-acts/
Cliff Click's High Scale lib, Non-Blocking HashMap
http://sourceforge.net/projects/high-scale-lib/
Q & A
(& Refs 1 of 2)
References:
Memory Leak
http://deusch.org/blog/?p=9
Handy list of jvm options
http://blogs.sun.com/watt/resource/jvm-options-list.html
Fragger (with source code)
http://www.azulsystems.com/resources
Garbage Collection: Algorithms for Automatic Dynamic Memory
Management, Richard Jones, Rafael D Lins
Q & A
(& Refs 2 of 2)
Backup slide–
Fragmentation
• Works well for hours at
300-400MB
– Same workload
• Suddenly haywire
– Promotion
• Too frequently
– Back to back FullGCs
– May not all be
completing.

Mais conteúdo relacionado

Mais procurados

Diagnosing Your Application on the JVM
Diagnosing Your Application on the JVMDiagnosing Your Application on the JVM
Diagnosing Your Application on the JVMStaffan Larsen
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: IntroductionBrendan Gregg
 
Oaktable World 2014 Toon Koppelaars: database constraints polite excuse
Oaktable World 2014 Toon Koppelaars: database constraints polite excuseOaktable World 2014 Toon Koppelaars: database constraints polite excuse
Oaktable World 2014 Toon Koppelaars: database constraints polite excuseKyle Hailey
 
BTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesBTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesJonathan Klein
 
Roy foubister (hosting high traffic sites on a tight budget)
Roy foubister (hosting high traffic sites on a tight budget)Roy foubister (hosting high traffic sites on a tight budget)
Roy foubister (hosting high traffic sites on a tight budget)WordCamp Cape Town
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to RootsBrendan Gregg
 
Cloud forensics putting the bits back together
Cloud forensics putting the bits back togetherCloud forensics putting the bits back together
Cloud forensics putting the bits back togetherShakacon
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
 
Efficient use of NodeJS
Efficient use of NodeJSEfficient use of NodeJS
Efficient use of NodeJSYura Bogdanov
 
The JVM is your friend
The JVM is your friendThe JVM is your friend
The JVM is your friendKai Koenig
 
Devoxx Maroc 2015 HTTP 1, HTTP 2 and folks
Devoxx Maroc  2015 HTTP 1, HTTP 2 and folksDevoxx Maroc  2015 HTTP 1, HTTP 2 and folks
Devoxx Maroc 2015 HTTP 1, HTTP 2 and folksNicolas Martignole
 
Gearman - Northeast PHP 2012
Gearman - Northeast PHP 2012Gearman - Northeast PHP 2012
Gearman - Northeast PHP 2012Mike Willbanks
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
 
So You Want To Write Your Own Benchmark
So You Want To Write Your Own BenchmarkSo You Want To Write Your Own Benchmark
So You Want To Write Your Own BenchmarkDror Bereznitsky
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickrxlight
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...Chris Fregly
 

Mais procurados (20)

Diagnosing Your Application on the JVM
Diagnosing Your Application on the JVMDiagnosing Your Application on the JVM
Diagnosing Your Application on the JVM
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: Introduction
 
Oaktable World 2014 Toon Koppelaars: database constraints polite excuse
Oaktable World 2014 Toon Koppelaars: database constraints polite excuseOaktable World 2014 Toon Koppelaars: database constraints polite excuse
Oaktable World 2014 Toon Koppelaars: database constraints polite excuse
 
Os Ramani
Os RamaniOs Ramani
Os Ramani
 
BTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesBTV PHP - Building Fast Websites
BTV PHP - Building Fast Websites
 
Roy foubister (hosting high traffic sites on a tight budget)
Roy foubister (hosting high traffic sites on a tight budget)Roy foubister (hosting high traffic sites on a tight budget)
Roy foubister (hosting high traffic sites on a tight budget)
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
Cloud forensics putting the bits back together
Cloud forensics putting the bits back togetherCloud forensics putting the bits back together
Cloud forensics putting the bits back together
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
 
Efficient use of NodeJS
Efficient use of NodeJSEfficient use of NodeJS
Efficient use of NodeJS
 
The JVM is your friend
The JVM is your friendThe JVM is your friend
The JVM is your friend
 
Devoxx Maroc 2015 HTTP 1, HTTP 2 and folks
Devoxx Maroc  2015 HTTP 1, HTTP 2 and folksDevoxx Maroc  2015 HTTP 1, HTTP 2 and folks
Devoxx Maroc 2015 HTTP 1, HTTP 2 and folks
 
Gearman - Northeast PHP 2012
Gearman - Northeast PHP 2012Gearman - Northeast PHP 2012
Gearman - Northeast PHP 2012
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame Graphs
 
So You Want To Write Your Own Benchmark
So You Want To Write Your Own BenchmarkSo You Want To Write Your Own Benchmark
So You Want To Write Your Own Benchmark
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickr
 
HowTo DR
HowTo DRHowTo DR
HowTo DR
 
.NET Debugging Workshop
.NET Debugging Workshop.NET Debugging Workshop
.NET Debugging Workshop
 
Gearman
GearmanGearman
Gearman
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
 

Destaque

JVM, byte codes & jvm languages
JVM, byte codes & jvm languagesJVM, byte codes & jvm languages
JVM, byte codes & jvm languagesEdgar Espina
 
Mastering Java Bytecode With ASM - 33rd degree, 2012
Mastering Java Bytecode With ASM - 33rd degree, 2012Mastering Java Bytecode With ASM - 33rd degree, 2012
Mastering Java Bytecode With ASM - 33rd degree, 2012Anton Arhipov
 
JavaOne 2012 CON 3961 Innovative Testing Techniques Using Bytecode Instrument...
JavaOne 2012 CON 3961 Innovative Testing Techniques Using Bytecode Instrument...JavaOne 2012 CON 3961 Innovative Testing Techniques Using Bytecode Instrument...
JavaOne 2012 CON 3961 Innovative Testing Techniques Using Bytecode Instrument...PaulThwaite
 
Bytecode manipulation with Javassist and ASM
Bytecode manipulation with Javassist and ASMBytecode manipulation with Javassist and ASM
Bytecode manipulation with Javassist and ASMashleypuls
 
Java Bytecode For Discriminating Developers - GeeCON 2011
Java Bytecode For Discriminating Developers - GeeCON 2011Java Bytecode For Discriminating Developers - GeeCON 2011
Java Bytecode For Discriminating Developers - GeeCON 2011Anton Arhipov
 
Java bytecode and classes
Java bytecode and classesJava bytecode and classes
Java bytecode and classesyoavwix
 
Make Java Profilers Lie Less
Make Java Profilers Lie LessMake Java Profilers Lie Less
Make Java Profilers Lie LessJaroslav Bachorik
 
GeeCon2016- High Performance Instrumentation (handout)
GeeCon2016- High Performance Instrumentation (handout)GeeCon2016- High Performance Instrumentation (handout)
GeeCon2016- High Performance Instrumentation (handout)Jaroslav Bachorik
 

Destaque (8)

JVM, byte codes & jvm languages
JVM, byte codes & jvm languagesJVM, byte codes & jvm languages
JVM, byte codes & jvm languages
 
Mastering Java Bytecode With ASM - 33rd degree, 2012
Mastering Java Bytecode With ASM - 33rd degree, 2012Mastering Java Bytecode With ASM - 33rd degree, 2012
Mastering Java Bytecode With ASM - 33rd degree, 2012
 
JavaOne 2012 CON 3961 Innovative Testing Techniques Using Bytecode Instrument...
JavaOne 2012 CON 3961 Innovative Testing Techniques Using Bytecode Instrument...JavaOne 2012 CON 3961 Innovative Testing Techniques Using Bytecode Instrument...
JavaOne 2012 CON 3961 Innovative Testing Techniques Using Bytecode Instrument...
 
Bytecode manipulation with Javassist and ASM
Bytecode manipulation with Javassist and ASMBytecode manipulation with Javassist and ASM
Bytecode manipulation with Javassist and ASM
 
Java Bytecode For Discriminating Developers - GeeCON 2011
Java Bytecode For Discriminating Developers - GeeCON 2011Java Bytecode For Discriminating Developers - GeeCON 2011
Java Bytecode For Discriminating Developers - GeeCON 2011
 
Java bytecode and classes
Java bytecode and classesJava bytecode and classes
Java bytecode and classes
 
Make Java Profilers Lie Less
Make Java Profilers Lie LessMake Java Profilers Lie Less
Make Java Profilers Lie Less
 
GeeCon2016- High Performance Instrumentation (handout)
GeeCon2016- High Performance Instrumentation (handout)GeeCon2016- High Performance Instrumentation (handout)
GeeCon2016- High Performance Instrumentation (handout)
 

Semelhante a JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When Things Go Wrong

Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interactionGovind Kanshi
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
Larson Macaulay apt_malware_past_present_future_out_of_band_techniques
Larson Macaulay apt_malware_past_present_future_out_of_band_techniquesLarson Macaulay apt_malware_past_present_future_out_of_band_techniques
Larson Macaulay apt_malware_past_present_future_out_of_band_techniquesScott K. Larson
 
Jvm problem diagnostics
Jvm problem diagnosticsJvm problem diagnostics
Jvm problem diagnosticsDanijel Mitar
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.jsorkaplan
 
19-reliabilitytesting.ppt
19-reliabilitytesting.ppt19-reliabilitytesting.ppt
19-reliabilitytesting.pptAnilteaser
 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data RacesAzul Systems Inc.
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Spark Summit
 
Maria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityMaria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityOSSCube
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera ClusterAbdul Manaf
 
Lessons learned from building Demand Side Platform
Lessons learned from building Demand Side PlatformLessons learned from building Demand Side Platform
Lessons learned from building Demand Side Platformbbogacki
 
Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101Tomer Gabel
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
The Hacker News: Hacking Wireless DSL routers via Admin Panel Password Reset ...
The Hacker News: Hacking Wireless DSL routers via Admin Panel Password Reset ...The Hacker News: Hacking Wireless DSL routers via Admin Panel Password Reset ...
The Hacker News: Hacking Wireless DSL routers via Admin Panel Password Reset ...The Hacker News
 

Semelhante a JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When Things Go Wrong (20)

Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Larson Macaulay apt_malware_past_present_future_out_of_band_techniques
Larson Macaulay apt_malware_past_present_future_out_of_band_techniquesLarson Macaulay apt_malware_past_present_future_out_of_band_techniques
Larson Macaulay apt_malware_past_present_future_out_of_band_techniques
 
Jvm problem diagnostics
Jvm problem diagnosticsJvm problem diagnostics
Jvm problem diagnostics
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
19-reliabilitytesting.ppt
19-reliabilitytesting.ppt19-reliabilitytesting.ppt
19-reliabilitytesting.ppt
 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data Races
 
Heap & thread dump
Heap & thread dumpHeap & thread dump
Heap & thread dump
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
 
Maria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityMaria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High Availability
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
 
Lessons learned from building Demand Side Platform
Lessons learned from building Demand Side PlatformLessons learned from building Demand Side Platform
Lessons learned from building Demand Side Platform
 
Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
The Hacker News: Hacking Wireless DSL routers via Admin Panel Password Reset ...
The Hacker News: Hacking Wireless DSL routers via Admin Panel Password Reset ...The Hacker News: Hacking Wireless DSL routers via Admin Panel Password Reset ...
The Hacker News: Hacking Wireless DSL routers via Admin Panel Password Reset ...
 

Mais de srisatish ambati

H2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business TransformationH2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business Transformationsrisatish ambati
 
Digital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open SourceDigital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open Sourcesrisatish ambati
 
Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.srisatish ambati
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopsrisatish ambati
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoopsrisatish ambati
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoopsrisatish ambati
 
Brisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavaBrisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavasrisatish ambati
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccsrisatish ambati
 
Svccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandraSvccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandrasrisatish ambati
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...srisatish ambati
 
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Javasrisatish ambati
 

Mais de srisatish ambati (15)

H2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business TransformationH2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business Transformation
 
Digital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open SourceDigital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open Source
 
Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 
Cassandra at no_sql
Cassandra at no_sqlCassandra at no_sql
Cassandra at no_sql
 
Brisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavaBrisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjava
 
Brisk hadoop june2011
Brisk hadoop june2011Brisk hadoop june2011
Brisk hadoop june2011
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Jvm goes big_data_sfjava
Jvm goes big_data_sfjavaJvm goes big_data_sfjava
Jvm goes big_data_sfjava
 
jvm goes to big data
jvm goes to big datajvm goes to big data
jvm goes to big data
 
Svccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandraSvccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandra
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
 
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
 

Último

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 

JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When Things Go Wrong

  • 1. Top 10 Issues for Java in Production SriSatish Ambati Cliff Click Jr. Azul Systems, Inc
  • 2. A Decade of Java in Production • A lot of hard-earned wisdom • A lot of victories (quickly forgotten) • A lot of endless pain points • Usually the Pain Point is really A Systems Issue • It's Not Just the JVM (nor network, nor ...)
  • 3. Tools of the Trade • What the JVM is doing: – dtrace, hprof, introscope, jconsole, visualvm, yourkit, azul zvision • Invasive JVM observation tools: – bci, jvmti, jvmdi/pi agents, logging • What the OS is doing: – dtrace, oprofile, vtune • What the network/disk is doing: – ganglia, iostat, lsof, nagios, netstat
  • 4. Tools of the Trade
  • 5. • Symptom – Production monitoring can be very expensive Staging environment does not repro issues – Instrumented code changes cache profile – MBeans are not cheap either! • Solutions – Pick the right axe for the problem! – Avoid expensive heap walks – Finish task then increment perf counters – Asynchronous logging, jconsole, azul zvision 10 - Instrumentation is Not Cheap
  • 6. 9 - Leaks • Symptom – App consumes all the memory you got – Live Heap trend is a ramping sawtooth – Then slows, then throws OutOfMemory • Tools – yourkit, hprof, eclipse mat, jconsole, jhat, jps, visualvm, azul zvision • Theory – Allocated vs Live Objects, vm memory, Perm Gen – Finalizers, ClassLoaders, ThreadLocal
  • 7. Leaks: jconsole • Tomcat + ActiveMQ – 1 week in production – after 9hrs in test – Leaks 100MB/hr
  • 8. Leaks: Visual VM 14741 classes loaded ClassLoader leak, PermGen full
  • 9. 9 – Leaks: Bloat • Cost of an 8-char String? • Cost of 100-entry TreeMap<Double,Double> ? 8b hdr 12b fields 4b ptr 4b pad 8b hdr 4b len 16b data A: 56 bytes, or a 7x blowup 48b TreeMap 40b TreeMap$Entry 100 16b Double 16b Double 100 100 A: 7248 bytes or a ~5x blowup
  • 10. Class name Size (B) Count Avg (B) Total 21,580,592 228,805 94.3 char[] 4,215,784 48,574 86.8 byte[] 3,683,984 5,024 733.3 Built-in VM methodKlass 2,493,064 16,355 152.4 Built-in VM constMethodKlass 1,955,696 16,355 119.6 Built-in VM constantPoolKlass 1,437,240 1,284 1,119.30 Built-in VM instanceKlass 1,078,664 1,284 840.1 java.lang.Class[] 922,808 45,354 20.3 Built-in VM constantPoolCacheKlass 903,360 1,132 798 java.lang.String 753,936 31,414 24 java.lang.Object[] 702,264 8,118 86.5 java.lang.reflect.Method 310,752 2,158 144 short[] 261,112 3,507 74.5 java.lang.Class 255,904 1,454 176 int[][] 184,680 2,032 90.9 java.lang.String[] 173,176 1,746 99.2 java.util.zip.ZipEntry 172,080 2,390 72 Apache Tomcat 6.0 Allocated Class name Size (B) Count Avg (B) Total 1,410,764,512 19,830,135 71.1 char[] 423,372,528 4,770,424 88.7 byte[] 347,332,152 1,971,692 176.2 int[] 85,509,280 1,380,642 61.9 java.lang.String 73,623,024 3,067,626 24 java.lang.Object[] 64,788,840 565,693 114.5 java.util.regex.Matcher 51,448,320 643,104 80 java.lang.reflect.Method 43,374,528 301,212 144 java.util.HashMap$Entry[] 27,876,848 140,898 197.9 java.util.TreeMap$Entry 22,116,136 394,931 56 java.util.HashMap$Entry 19,806,440 495,161 40 java.nio.HeapByteBuffer 17,582,928 366,311 48 java.nio.HeapCharBuffer 17,575,296 366,152 48 java.lang.StringBuilder 15,322,128 638,422 24 java.util.TreeMap$EntryIterator 15,056,784 313,683 48 java.util.ArrayList 11,577,480 289,437 40 java.util.HashMap 7,829,056 122,329 64 java.util.TreeMap 7,754,688 107,704 72 Million Objects allocated live JBoss 5.1 20 4 Apache Tomcat 6.0 0.25 0.1 Live JBoss 5.1 Allocated JEE is not cheap! JBoss & Apache startup - 20M objects before starting the app
  • 11. example: yourkit memory profiling Know footprint: use memory profiling! (snapshots are still expensive)
  • 13. • Symptom – Multi-node scale-out does not scale linearly – Time in both CPU and I/O (serialization costs) • Tools –Cpu profiling, I/O profiling • Solution – All serialization libraries are not equal! – Pick a high performance serialization library or roll-your-own – Avro, kryo, protocol-buffers, thrift 8 – I/O: Serialization
  • 14. • Symptom – Application hangs or remote call fails after awhile – “Too many open File Descriptors”, “Cursors” – Inconsistent response times • Tools – nagios, pkg, rpm info, ulimit, yum • Solutions – Check for “new” OS patches, user & process limits, network & semaphore configurations – Close all I/O streams – Maybe you are I/O bound! 8 – I/O: Limits, Tuning
  • 15. • Symptoms – Socket.create/close takes too long – JRMP timeouts, long JDBC calls – Running out of file descriptors, cursors, disk • Tools – dbms tools, du, iostat, gmon, lsof, netstat • Workaround – Check all O/S patches, sysctl flags, run ping/telnet test – Check & set SO_LINGER, TCP_LINGER2 8 – I/O: Sockets, Files, DB
  • 17. • Symptoms – Adding users / threads / CPUs causes app slow down (less throughput, worse response) – High lock acquire times & contention – Race conditions, deadlock, I/O under lock • Tools – d-trace, lockstat, azul zvision • Solution – Use non-blocking Collections – Striping locks, reducing hold times, no I/O 7 – Locks & synchronized
  • 18. Example: IBM Visual Analyzer (j.u.c view in eclipse) Zillion threads acquiring same lock j.u.c.ConcurrentLock is still a lock! Need a non-blocking collection (or stripe lock or lower hold times, etc)
  • 19. Example: zvision Hot lock is usually 10x to 100x more acquire time than next lock.. Look for rapidly growing acquire times!
  • 21. • Symptom – Time “compiling” – Time in the Interpreter • Tools – -XX:+PrintCompilation, cpu profiler – Find endlessly-recompiling method • Workaround – Exclude using .hotspot_compiler file • Root cause: It's a JVM Bug! File a bug report! 6 – Endless Compilation
  • 22. • Symptom – Application spends time in j.l.T.fillInStackTrace() • Tools – Cpu profiler, azul zvision – Thread dumps (repeated kill -3, zvision) – Track caller/callee to find throw'r • Not all exceptions appear in log files • Solution – Don't Throw, alternate return value (e.g. null) 5 – Endless Exceptions
  • 23. • Related – Exception paths are typically failure paths – JVMs do not to optimize them much – Often found when a server collapses 5 – Endless Exceptions
  • 24. • Symptom – Performance degrades over time – Inducing a “Full GC” makes problem go away – Lots of free memory but in tiny fragments • Tools – GC logging flags, e.g. for CMS -XX:PrintFLSStatistics=2 -XX:+PrintCMSInitiationStatistics 4 - Fragmentation
  • 25. • Tools – “Fragger” www.azulsystems.com/resources • Tiny cpu cost, low memory cost • Frag's heap in 60sec like an hour in production • Get FullGC cycles at dev's desk • Solution – Upgrade to latest CMS (CR:6631166) – Azul Zing & Gen Pauseless GC – Pooling similar sized/aged objects • (really hard to get right!) 4 - Fragmentation
  • 26. • Symptom – Entropy(gc) == number_of_gc_flags • Too many free parameters • 64-bit/large heap size is not a solution – Constant 40-60% CPU utilization by GC – Scheduled reboot before full-GC – Full time Engineer working GC flags; • Workarounds – Ask JVM Vendor to give 1 flag solution – G1 GC, Azul’s Zing GPGC 3 – GC Tuning
  • 27. -server -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:MaxPermSize=128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxNewSize=64m -XX:NewSize=64m -Xms1536m -Xmx1536m -XX:SurvivorRatio=128 -XX:MaxTenuringThreshold=0 -XX:CMSInitiatingOccupancyFraction=60 -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFF FFFFFE -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFF FFFFFE 3 – GC Tuning Oracle Weblogic GC Flags
  • 28. 2 - Spikes • Symptoms – Rush hour traffic, tax day, Black Friday – Outages under spikes, power law of networks • Solution – Measure. – Test with realistic load & realistic multi-node setup – Build redundancy & elasticity into infrastructure – Don’t ignore Exceptions & retries under load
  • 30. • Symptom – Different nodes have different configurations, different stack components, versions – classpath has dist/*, -verbose:class – subtle hard to reproduce issues • Solution – Method. Version Control. – Good ol’ fashioned rigor 1 – Versionitis When ears wage class wars with jars “It can only be attributable to human error” - HAL
  • 31. • Runs fine as load Ramps Up – At peak load, system is unstable – Slightly above peak: Collapse! • Heavy load triggers exception (e.g. timeout) • Exception path is slow already (e.g. logging) • Transaction retried (so more work yet again) • So NEXT transaction times-out • Soon all time spent throwing & logging exceptions • No forward progress 0 – Collapse Under Load (pick any 3 above!)
  • 33. References: Java.util.concurrent lock profiling http://infoq.com/jucprofiler Java serialization benchmarks http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2 Memory profiling with yourkit http://yourkit.com Tuning gc http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html http://blog.codecentric.de/en/2010/01/ java-outofmemoryerror-a-tragedy-in-seven-acts/ Cliff Click's High Scale lib, Non-Blocking HashMap http://sourceforge.net/projects/high-scale-lib/ Q & A (& Refs 1 of 2)
  • 34. References: Memory Leak http://deusch.org/blog/?p=9 Handy list of jvm options http://blogs.sun.com/watt/resource/jvm-options-list.html Fragger (with source code) http://www.azulsystems.com/resources Garbage Collection: Algorithms for Automatic Dynamic Memory Management, Richard Jones, Rafael D Lins Q & A (& Refs 2 of 2)
  • 35. Backup slide– Fragmentation • Works well for hours at 300-400MB – Same workload • Suddenly haywire – Promotion • Too frequently – Back to back FullGCs – May not all be completing.