The presentation covers the the basics of performance optimizations for real-world Java code. It starts with a theoretical overview of the concepts followed by several live demos
showing how performance bottlenecks can be diagnosed and eliminated. The demos include some non-trivial multi-threaded examples
inspired by real-world applications.
2. Who am I and why should you
trust me?
● Attila-Mihály Balázs
http://hype-free.blogspot.com/
● Former malware researcher (”low-level
guy”)
● Current Java dev (”high level dude”)
● Spent the last ~6 monts optimizing a large
(1 000 000+ LOC) legacy system
● Will spend the next 6 months on it too (at
least )
4. What's this about
● Core principles
● Demo 1: collections framework
● Demo 2, 3, 4: synchronization performance
● Demo 5: ugly code, is it worth it?
● Demo 6, 7, 8: playing with Strings
● Conclusions
● Q&A
5. What this is not about
● Selecting efficient algorithms
● High level optimizations (architectural
changes)
● These are important too! (but require more
effort, and we are going for the quick win
here)
6. Core principles
● Performance is a balence, and endless
game of shifting bottlenecks, no silver
bullets here!
CPU
CPU Memory
Memory
Your program
Disk
Disk Network
Network
7. Perform on all levels!
● Performance has many levels:
– Compiler (JIT): 5 to 6: 100%(1)
– Memory: L1/L2 cache, main memory
– Disk: cache, RAID, SSD
– Network: 10Mbit, 100Mbit, 1000Mbit
● Until recently we had it easy (performance
doubled every 18 months)
● Now we need to do some work
(1) http://java.sun.com/performance/reference/whitepapers/6_performance.html
8. Core principles
● Measure, measure, measure! (before,
during, after).
● Try using realistic data!
● Watch out for the Heisenberg effect (more
on this later)
● Some things are not intuitive:
– Pop-question: if processing 1000
messages takes 1 second, how long
does the processing of 1 message take?
10. Feasibility – ”numbers everyone
should know” (2)
● L1 cache reference 0.5 ns
● Branch mispredict 5 ns
● L2 cache reference 7 ns
● Mutex lock/unlock 100 ns
● Main memory reference 100 ns
● Compress 1K bytes with Zippy 10,000 ns
● Send 2K bytes over 1 Gbps network 20,000 ns
● Read 1 MB sequentially from memory 250,000 ns
● Round trip within same datacenter 500,000 ns
● Disk seek 10,000,000 ns
● Read 1 MB sequentially from network 10,000,000 ns
● Read 1 MB sequentially from disk 30,000,000 ns
● Send packet CA->Netherlands->CA 150,000,000 n
(2) http://research.google.com/people/jeff/stanford-295-talk.pdf
11. Feasability
● Amdahl's law: The speedup of a program
using multiple processors in parallel
computing is limited by the time needed for
the sequential fraction of the program.
12. Course of action
● Have a clear (written?), measourable goal:
operation X should take less than 100ms in
99.9% of the cases
● Measure (profile)
● Is the goal met? → The End
● Optimize hotspots → go to step 2
14. Demo 1: collections framework
● Name 3 things wrong with this code:
Vector<String> v1;
…
if (!v1.contains(s)) { v1.add(s); }
15. Demo 1: collections framework
● Wrong data structure (list / array instead of
set), hence slooow performance for large
data sets (but not for small ones!)
● Extra synchronization if used by a single
thread only
● Not actually thread safe! (only ”exception
safe”)
16. Demo 1: lessons
● Use existing classes
● Use realistic sample data
● Thread safety is hard!
● Heisenberg (observer) effect
17. Demo 2, 3, 4: synchronization
performance
● If I have N units of work and use 4, it must
be faster than using a single thread, right?
● What does lock contention look like?
● What does a ”synchronization train(wreck)”
look like?
18. Demo 2, 3, 4: lessons
● Use existing classes
– ReadWriteLock
– java.util.concurrent.*
● Use realistic sample data (too short / too
long units of work)
● Sometimes throwing a threadpool at it
makes it worse!
● Consider using a private copy of the
variable for each thread
19. Demo 5: ugly code, is it worth it?
● Parsing a logfile
24. Demo 7: Lessons
● You shouldn't use String.intern:
– Slow
– You have to use it everywhere
– Needs hand-tuning
● Use a WeakHashMap for caching (don't
forget to synchronize!)
● Use String.equals (not ==)
27. Conclusions
● Measure twice, cut once
● Don't trust advice you didn't test! (including
mine)
● Most of the time you don't need to sacrifice
clean code for performant code