Performance optimization techniques for Java code

Performance optimization
techniques for Java code

Who am I and why should you
trust me? 
● Attila-Mihály Balázs
http://hype-free.blogspot.com/
● Former malware researcher (”low-level
guy”)
● Current Java dev (”high level dude”)
● Spent the last ~6 monts optimizing a large
(1 000 000+ LOC) legacy system
● Will spend the next 6 months on it too (at
least )

What's this about
● Core principles
● Demo 1: collections framework
● Demo 2, 3, 4: synchronization performance
● Demo 5: ugly code, is it worth it?
● Demo 6, 7, 8: playing with Strings
● Conclusions
● Q&A

What this is not about
● Selecting efficient algorithms
● High level optimizations (architectural
changes)

● These are important too! (but require more
effort, and we are going for the quick win
here)

Core principles
● Performance is a balence, and endless
game of shifting bottlenecks, no silver
bullets here!

CPU
CPU Memory
Memory
Your program

Disk
Disk Network
Network

Perform on all levels!
● Performance has many levels:
– Compiler (JIT): 5 to 6: 100%(1)
– Memory: L1/L2 cache, main memory
– Disk: cache, RAID, SSD
– Network: 10Mbit, 100Mbit, 1000Mbit
● Until recently we had it easy (performance
doubled every 18 months)
● Now we need to do some work
(1) http://java.sun.com/performance/reference/whitepapers/6_performance.html

Core principles
● Measure, measure, measure! (before,
during, after).
● Try using realistic data!
● Watch out for the Heisenberg effect (more
on this later)
● Some things are not intuitive:
– Pop-question: if processing 1000
messages takes 1 second, how long
does the processing of 1 message take?

Core principles
● Troughput
● Latency
● Thread context, context switching
● Lock contention
● Queueing theory
● Profiling
● Sampling

Feasibility – ”numbers everyone
should know” (2)
● L1 cache reference 0.5 ns
● Branch mispredict 5 ns
● L2 cache reference 7 ns
● Mutex lock/unlock 100 ns
● Main memory reference 100 ns
● Compress 1K bytes with Zippy 10,000 ns
● Send 2K bytes over 1 Gbps network 20,000 ns
● Read 1 MB sequentially from memory 250,000 ns
● Round trip within same datacenter 500,000 ns
● Disk seek 10,000,000 ns
● Read 1 MB sequentially from network 10,000,000 ns
● Read 1 MB sequentially from disk 30,000,000 ns
● Send packet CA->Netherlands->CA 150,000,000 n
(2) http://research.google.com/people/jeff/stanford-295-talk.pdf

Feasability
● Amdahl's law: The speedup of a program
using multiple processors in parallel
computing is limited by the time needed for
the sequential fraction of the program.

Course of action
● Have a clear (written?), measourable goal:
operation X should take less than 100ms in
99.9% of the cases
● Measure (profile)
● Is the goal met? → The End
● Optimize hotspots → go to step 2

Tools
● VisualVM
● JProfiler
● YourKit

● Eclipse TPTP
● Netbeans Profiler

Demo 1: collections framework
● Name 3 things wrong with this code:

Vector<String> v1;
…
if (!v1.contains(s)) { v1.add(s); }

Demo 1: collections framework
● Wrong data structure (list / array instead of
set), hence slooow performance for large
data sets (but not for small ones!)
● Extra synchronization if used by a single
thread only
● Not actually thread safe! (only ”exception
safe”)

Demo 1: lessons
● Use existing classes
● Use realistic sample data
● Thread safety is hard!
● Heisenberg (observer) effect

Demo 2, 3, 4: synchronization
performance
● If I have N units of work and use 4, it must
be faster than using a single thread, right?
● What does lock contention look like?
● What does a ”synchronization train(wreck)”
look like?

Demo 2, 3, 4: lessons
● Use existing classes
– ReadWriteLock
– java.util.concurrent.*
● Use realistic sample data (too short / too
long units of work)
● Sometimes throwing a threadpool at it
makes it worse!
● Consider using a private copy of the
variable for each thread

Demo 5: ugly code, is it worth it?
● Parsing a logfile

Demo 5: lessons
● Sometimes yes, but always profile first!

Demo 6: String.substring
● How are strings stored in Java?

Demo 6: Lesson
● You can look inside the JRE when needed!

Demo 7: Lessons
● You shouldn't use String.intern:
– Slow
– You have to use it everywhere
– Needs hand-tuning
● Use a WeakHashMap for caching (don't
forget to synchronize!)
● Use String.equals (not ==)

Demo 8: charsets
– ASCII
– ISO-8859-1
– UTF-8
– UTF-16

Demo 8: lessons
● Use UTF-8 where possible

Conclusions
● Measure twice, cut once
● Don't trust advice you didn't test! (including
mine)
● Most of the time you don't need to sacrifice
clean code for performant code

Conclusions
● Slides:
– Google Groups
– http://hype-free.blogspot.com/
– x_at_y_or_z@yahoo.com
● Source code:
– http://code.google.com/p/hype-
free/source/browse/#svn/trunk/java-
perfopt-201003
● Profiler evaluation licenses

Resources
● https://visualvm.dev.java.net/
● http://www.ej-technologies.com/
● http://blog.ej-technologies.com/
● http://www.yourkit.com/
● http://www.yourkit.com/docs/index.jsp
● http://www.yourkit.com/eap/index.jsp

Performance optimization techniques for Java code

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Performance optimization techniques for Java code

Similar to Performance optimization techniques for Java code (20)

Recently uploaded

Recently uploaded (20)

Performance optimization techniques for Java code