SlideShare uma empresa Scribd logo
1 de 63
Text


Java and the Machine
   Kirk Pepperdine (@kcpeppe)
   Martijn Verburg (@karianna)
This guy is coming for you
The developer version of the T-800
Hey Martijn, do you know what this is?
A relic simpler era!!!
Intel 4004

Shipped 1971

108khz-740khz

10µm die (1/10th of human hair)

16 pin DIP

2,300 transistors

shared address and data bus

46300 or 92600 instructions / second

Instruction cycle of 10.8 or 21.6 µs (8 clock cycles / instruction cycle)

4 bit bus (12 bit addressing, 8 bit instruction, 4 bit word)

46 instructions (41-8 bit, 5-16 bit), described in a 9 page doc!!!
Intel i7

Shipped 2008

3.3Ghz (4200x)

2nm die (5000x smaller)

64 bit instructions

1370 landings

774 million transistors (~336,000x)

50000x less expensive, 350,000x faster, 5000x less power / transistor

31 stage instruction pipeline (instructions retired @ 1 / clock)

 • adjacent and nearly adjacent instructions run at the same time
packaging doc is 102 pages

MSR documentation is well over 7000 pages
Lots of challenges!



• Back to basics

• Challenge: Hardware Threads

• Challenge: Virtualisation

• Challenge: The JVM

• Challenge: Java

• Conclusion
It's 18:45 on conference day 1
You have a comforting beverage in your hand



    This talk is going to hurt
             T-800 don't care.


                                        9
Back to Basics



• Moore's Law

• Little's Law

• Amdahl's Law

• It's the hardware stupid!

• What can I do?
Moore's Law

• It's about transistors not clocks
Little's Law



λ=1/µ
Little's Law



        λ=1/µ

Throughput = 1 / Service Time
Amdahl's Law
It's the Hardware stupid!

• Software often thinks there are no limitations
    – If you're Turing complete you can do anything!


• The reality is that its fighting over finite resources
    – You remember hardware right?


• These all have finite capacities and throughputs
    –   CPU
    –   RAM
    –   Disk
    –   Network
    –   Bus
Mechanical Sympathy


             "Mechanical Sympathy
                      -
   Hardware and Software working
             together"
— Martin Thompson
What can I do?

• Don't panic


• Learn the laws
    – Have them up next to your monitor


• Code the laws
    – See them in action!


• Amdahl's law says "don't serialise!"
    – With no serialisation, no need to worry abut Little's law!


• Understand what hardware you have
    – Understand capacities and throughputs
    – Learn how to measure, not guess those numbers
Still with us?
   Good.




                 17
Hardware Threads



• Threads are non deterministic

• Single threaded programs

• Multi core programs

• What can I do?

• Protect your code
Threads are non-deterministic


  "Threads seem to be a small step
    from sequential computation"


      "In fact, they represent a huge
                    step."
— The Problem with Threads, Edward A. Lee, UC Berkeley, 2006
In the beginning there was fork

• It was a copy operation
    – No common memory space
    – So we had to work hard to share state between processes


• Everything was local to us
    – We had no race conditions
    – Hardware caching was transparent, we didn't care
    – And we could cache anything at anytime without fear!


• Life was simpler
    – Ordering was guaranteed
In case you've forgotten
In case you've forgotten
Then we had multi-thread / single core

   • new Thread() is like a light-weight fork
       – Memory is now shared
       – You don't work hard to share state
       – Instead you work hard to guarantee order


   • Guaranteeing order is a much harder problem
       – We started with volatile and synchronized
       – They were fence builders, to help guarantee order
       – It made earlier processors do more work


   • Hardware, O/S & JVM guys were playing leap frog!
Now we have multiple hardware threads

    • new Thread() is still a light-weight fork
        – Memory is still shared
        – Now we have to work hard to share state due to caching
        – We still have to work hard to guarantee order


    • Guaranteed order is now a performance hit
        – volatile and synchronized force draining and
          refreshing of caches
        – Some hardware support to get around this (CAS)
        – Some software support to get around this (NUMA)


    • Hardware, O/S & JVM guys are still playing leap frog!
What can I do?




    "We protect code with the hope
                   of protecting data."
— Gil Tene, Azul Systems, 2008
Protect your data

• You can use a bunch of different techniques
    – synchronized
    – volatile
    – atomics, e.g. AtomicInteger
    – Explicit Java 5 Locks, e.g. ReentrantLock


• But these all have a performance hit


• Lets see examples of these in action
This fence was atomic
The following source code can be
            read later


Don't worry there is a point to this

                               28
Running Naked

    private int counter;
public void execute( int threadCount) {
     init();
     Thread[] threads = new Thread[ threadCount];
     for ( int i = 0; i < threads.length; i++) {
         threads[i] = new Thread( new Runnable() {
             public void run() {
                 long iterations = 0;
                 try {
                     while ( running) {
                         counter++;
                         counter--;
                         iterations++;
                     }
                 } finally {
                     totalIterations += iterations;
                 }
             }
        });
     }
   ...
Using volatile
private volatile int counter;
public void execute( int threadCount) {
    init();
    Thread[] threads = new Thread[ threadCount];
    for ( int i = 0; i < threads.length; i++) {
        threads[i] = new Thread( new Runnable() {
            public void run() {
                long iterations = 0;
                try {
                    while ( running) {
                        counter++;
                        counter--;
                        iterations++;
                    }
                } finally {
                    totalIterations += iterations;
                }
            }
       });
    }
  ...
Using synchronized
private int counter;
private final Object lock = new Object();
public void execute( int threadCount) {
    init();
    Thread[] threads = new Thread[ threadCount];
    for ( int i = 0; i < threads.length; i++) {
        threads[i] = new Thread( new Runnable() {
            public void run() {
                long iterations = 0;
                try {
                    while ( running) {
                        synchronized (lock) {
                            counter++;
                            counter--;
                        }
                        iterations++;
                    }
                } finally {
                    totalIterations += iterations;
                }
            }
       });
    }
  ...
Using fully explicit Locks
private int counter;
private final ReentrantReadWriteLock lock =
          new ReentrantReadWriteLock();
public void execute( int threadCount) {
    init();
    Thread[] threads = new Thread[ threadCount];
    for ( int i = 0; i < threads.length; i++) {
        threads[i] = new Thread( new Runnable() {
            public void run() {
                long iterations = 0;
                try {
                    while ( running) {
                        try {
                            lock.writeLock.lock();
                            counter++;
                            counter--;
                        } finally {
                            lock.writeLock.unlock();
                        }
                        iterations++;
                    }
                } finally {
                    totalIterations += iterations;
                }
  ...
Using AtomicInteger
private AtomicInteger counter = new AtomicInteger(0);
public void execute( int threadCount) {
    init();
    Thread[] threads = new Thread[ threadCount];
    for ( int i = 0; i < threads.length; i++) {
        threads[i] = new Thread( new Runnable() {
            public void run() {
                long iterations = 0;
                try {
                    while ( running) {
                        counter.getAndIncrement();
                        counter.getAndDecrement();
                        iterations++;
                    }
                } finally {
                    totalIterations += iterations;
                }
  ...
A small benchmark comparison
A small benchmark comparison
So the point is that locking kills
         performance!




                               35
Challenge: Virtualisation



• Better utilisation of hardware?

• You can't virtualise into more hardware

• What can I do?
Virtualisation




                    "Why do we do it?"
— Martijn Verburg and Kirk Pepperdine, JAX London 2012
Better utilisation of hardware?

• Could be utopia because we waste hardware?
    – Most hardware is idle
    – Load averages are far less than 10% on many systems


• But why are our systems under utilised?
    – Often because we can't feed them (especially CPU)
    – Throughput in a system is often limited by a single resource


• Trisha Gee is talking more about this tomorrow!
    – 14:25 - Java Performance FAQ
T-800 is incapable of seeing your process


        "A Process is defined as a locus
            of control, an instruction
          sequence and an abstraction
        entity which moves through the
         instructions of a procedure, as
         the procedure is executed by a
                   processor"
    — Jack B Dennis, Earl C Van Horn, 1965
There's no such thing as a
process as far as the hardware
         is concerned


                          40
You can't virtualise into more hardware!

    • Throughput is often limited by a single resource
        – That bottleneck can starve everybody else!


    • Going from most problematic to most problematic
        –   Network, Disk, RAM, CPU
        –   Throughput overwhelms the wire
        –   NAS means more pressure on network
        –   CPU/RAM speed gap growing (8% per year)
        –   Many thread scheduling issues (including cache!)
        –   The list goes on and on and on..........


    • Virtual machines sharing hardware causes resource
      conflict
        – You need more than core affinity
What can I do?

• Build your financial and capacity use cases
    – TCO for virtual vs bare metal
    – What does your growth curve look like?


• CPU Pinning trick / Processor Affinity


• Memory Pinning trick


• Move back to bare metal!
    – It's OK - really!
Final Thought


  "There are many reasons to move
   to the cloud - performance isn't
      necessarily one of them."
— Kirk Pepperdine - random rant - 2010
Challenge: The JVM



• WORA

• The cost of a strong memory model

• GC scalability

• GPU

• File Systems

• What can I do?
Brian?
You've got a lot of work to do



                           45
WORA costs

• CPU Model differences
    – When are where are you allowed to cache?
    – When and where are you allowed to reorder?
    – AMD vs Intel vs Sparc vs ARM vs .....


• File system differences
    – O/S Level support


• Display Devices - Argh!
    – Impossible to keep up with the consumer trends
    – Aspect ratios, resolution, colour depth etc


• Power
    – From unlimited to extremely limited
The cost of the strong memory model

   • The JVM is ultra conservative
       – It rightly ensures correctness over performance


   • Locks enforce correctness
       – But in a very pessimistic way, which is expensive


   • Locks delineate regions of serialisation
       – Serialisation?
       – Uh oh! Remember Little's and Amdahl's laws?
The visibility mismatch

• Unit of visibility on a CPU != that in the JVM
    – CPU - Cache Line
    – JVM - Where the memory fences in the instruction pipeline
      are positioned


• False sharing is an example of this visibility mismatch
    – volatile creates a fence which flushes the cache


• We think that volatile is a modifier on a field
    – In reality it's a modifier on a cache line


• It can make other fields accidentally volatile
GC Scalability

• GC cost is dominated by the # of live objects in heap
    – Larger Heap ~= more live objects


• Mark identifies live objects
    – Takes longer with more objects


• Sweep deals with live objects
    – Compaction only deals with live objects
    – Evacuation only deals with live objects


• G1, Zing, and Balance are no different!
    – They still have to mark and sweep live objects
    – Mark for Zing is more efficient
GPU

• Can't utilise easily today
    – Java does not have normalised access to the GPU
    – The GPU does not have access to Java heap


• Today - third party libraries
    – That translate byte code to GPU RISC instructions
    – Bundle the data and the instructions together
    – Then push data and instructions through the GPU


• Project Sumatra on its way
    – http://openjdk.java.net/projects/sumatra/
    – Aparapi backs onto OpenCL


• CPU and GPU memory to converge?
What can I do?

• Not using java.util.concurrent?
    – You're probably doing it wrong


• Avoid locks where possible!
    – They make it difficult to run processes concurrently
    – But don't forget integrity!


• Use Unsafe to implement lockless concurrency
    –   Dangerous!!
    –   Allows non-blocking / wait free implementations
    –   Gives us access to the pointer system
    –   Gives us access to the CAS instruction


• Cliff Click's awesome lib
    – http://sourceforge.net/projects/high-scale-lib/
What can I do?

• Workaround GC with SLAB allocators
    – Hiding the objects from the GC in native memory
    – It's a slippery slope to writing your own GC
        • We're looking at you Cassandra!


• Workaround GC using near cache
    – Storing the objects in another JVM


• Peter Lawrey's thread affinity OSS project
    – https://github.com/peter-lawrey/Java-Thread-Affinity


• Join Adopt OpenJDK
    – http://www.java.net/projects/adoptopenjdk
Challenge: Java



• java.lang.Thread is low level

• File systems

• What can I do?
java.lang.Thread is low level

• Has low level co-ordination
    – wait()
    – notify(), notifyAll()


• Also has dangerous operations
    – stop() unlocks all the monitors - leaving data in an
      inconsistent state


• java.util.concurrent helps
    – But managing your threads is still manual
File Systems

• It was difficult to do asynchronous I/O
    – You would get stalled on large file interaction
    – You would get stalled on multi-casting


• There was no native file system support
    – NIO bought in pipe and selector support
    – NIO.2 bought in symbolic links support
What can I do?

• Read Brian's book
    – Java Concurrency in Practice


• Move to java.util.concurrent
    – People smarter than us have thought about this
    – Use Runnables, Callables and ExecutorServices


• Use thread safe collections
    – ConcurrentHashMap
    – CopyOnWriteArrayList


• Move to Java 7
    – NIO.2 gives you asynchronous I/O !
    – Fork and Join helps with Amdahls law
Conclusion



• Learn about hardware again
   – Check your capacity and throughput


• Virtualise with caution
   – It can work for and against you


• The JVM needs to evolve and so do you
   – OpenJDK is adapting, hopefully fast enough!


• Learn to use parallelism in code
   – Challenge each other in peer reviews!
Don't become extinct
Be like Sarah
Acknowledgements

• The jClarity Team

• Gil Tene - Azul Systems

• Warner Bros pictures

• Lionsgate Films

• Dan Hardiker - Adaptivist

• Trisha Gee - 10gen

• James Gough - Stackthread
Enjoy the community night!

      Kirk Pepperdine (@kcpeppe)
      Martijn Verburg (@karianna)

           www.jclarity.com

Mais conteúdo relacionado

Mais procurados

Inter threadcommunication.38
Inter threadcommunication.38Inter threadcommunication.38
Inter threadcommunication.38myrajendra
 
Learning Java 3 – Threads and Synchronization
Learning Java 3 – Threads and SynchronizationLearning Java 3 – Threads and Synchronization
Learning Java 3 – Threads and Synchronizationcaswenson
 
Basics of Java Concurrency
Basics of Java ConcurrencyBasics of Java Concurrency
Basics of Java Concurrencykshanth2101
 
Java multi threading
Java multi threadingJava multi threading
Java multi threadingRaja Sekhar
 
Synchronization.37
Synchronization.37Synchronization.37
Synchronization.37myrajendra
 
Java concurrency - Thread pools
Java concurrency - Thread poolsJava concurrency - Thread pools
Java concurrency - Thread poolsmaksym220889
 
Java Threads and Concurrency
Java Threads and ConcurrencyJava Threads and Concurrency
Java Threads and ConcurrencySunil OS
 
Advanced Introduction to Java Multi-Threading - Full (chok)
Advanced Introduction to Java Multi-Threading - Full (chok)Advanced Introduction to Java Multi-Threading - Full (chok)
Advanced Introduction to Java Multi-Threading - Full (chok)choksheak
 
PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017Yu-Hsun (lymanblue) Lin
 
Programming with Threads in Java
Programming with Threads in JavaProgramming with Threads in Java
Programming with Threads in Javakoji lin
 
Rust "Hot or Not" at Sioux
Rust "Hot or Not" at SiouxRust "Hot or Not" at Sioux
Rust "Hot or Not" at Siouxnikomatsakis
 
Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)nikomatsakis
 
Java Multithreading and Concurrency
Java Multithreading and ConcurrencyJava Multithreading and Concurrency
Java Multithreading and ConcurrencyRajesh Ananda Kumar
 
iOS Multithreading
iOS MultithreadingiOS Multithreading
iOS MultithreadingRicha Jain
 

Mais procurados (20)

Pthread
PthreadPthread
Pthread
 
Inter threadcommunication.38
Inter threadcommunication.38Inter threadcommunication.38
Inter threadcommunication.38
 
Learning Java 3 – Threads and Synchronization
Learning Java 3 – Threads and SynchronizationLearning Java 3 – Threads and Synchronization
Learning Java 3 – Threads and Synchronization
 
Basics of Java Concurrency
Basics of Java ConcurrencyBasics of Java Concurrency
Basics of Java Concurrency
 
Threads in python
Threads in pythonThreads in python
Threads in python
 
Java multi threading
Java multi threadingJava multi threading
Java multi threading
 
Synchronization.37
Synchronization.37Synchronization.37
Synchronization.37
 
Java concurrency - Thread pools
Java concurrency - Thread poolsJava concurrency - Thread pools
Java concurrency - Thread pools
 
Java Threads and Concurrency
Java Threads and ConcurrencyJava Threads and Concurrency
Java Threads and Concurrency
 
Advanced Introduction to Java Multi-Threading - Full (chok)
Advanced Introduction to Java Multi-Threading - Full (chok)Advanced Introduction to Java Multi-Threading - Full (chok)
Advanced Introduction to Java Multi-Threading - Full (chok)
 
PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017
 
Programming with Threads in Java
Programming with Threads in JavaProgramming with Threads in Java
Programming with Threads in Java
 
Rust "Hot or Not" at Sioux
Rust "Hot or Not" at SiouxRust "Hot or Not" at Sioux
Rust "Hot or Not" at Sioux
 
Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)
 
Multithreading in Java
Multithreading in JavaMultithreading in Java
Multithreading in Java
 
Java Multithreading and Concurrency
Java Multithreading and ConcurrencyJava Multithreading and Concurrency
Java Multithreading and Concurrency
 
Java Concurrency
Java ConcurrencyJava Concurrency
Java Concurrency
 
Concurrecy techdrop
Concurrecy techdropConcurrecy techdrop
Concurrecy techdrop
 
Multithreading Concepts
Multithreading ConceptsMultithreading Concepts
Multithreading Concepts
 
iOS Multithreading
iOS MultithreadingiOS Multithreading
iOS Multithreading
 

Semelhante a Java and the machine - Martijn Verburg and Kirk Pepperdine

.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/MultitaskingSasha Kravchuk
 
Medical Image Processing Strategies for multi-core CPUs
Medical Image Processing Strategies for multi-core CPUsMedical Image Processing Strategies for multi-core CPUs
Medical Image Processing Strategies for multi-core CPUsDaniel Blezek
 
Effective java - concurrency
Effective java - concurrencyEffective java - concurrency
Effective java - concurrencyfeng lee
 
Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Yulia Tsisyk
 
State of the .Net Performance
State of the .Net PerformanceState of the .Net Performance
State of the .Net PerformanceCUSTIS
 
13multithreaded Programming
13multithreaded Programming13multithreaded Programming
13multithreaded ProgrammingAdil Jafri
 
.NET Multithreading and File I/O
.NET Multithreading and File I/O.NET Multithreading and File I/O
.NET Multithreading and File I/OJussi Pohjolainen
 
Let's Talk Locks!
Let's Talk Locks!Let's Talk Locks!
Let's Talk Locks!C4Media
 
Qt multi threads
Qt multi threadsQt multi threads
Qt multi threadsYnon Perek
 
Threads in java, Multitasking and Multithreading
Threads in java, Multitasking and MultithreadingThreads in java, Multitasking and Multithreading
Threads in java, Multitasking and Multithreadingssusere538f7
 
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attacDefcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attacPriyanka Aash
 

Semelhante a Java and the machine - Martijn Verburg and Kirk Pepperdine (20)

.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
Medical Image Processing Strategies for multi-core CPUs
Medical Image Processing Strategies for multi-core CPUsMedical Image Processing Strategies for multi-core CPUs
Medical Image Processing Strategies for multi-core CPUs
 
Thread 1
Thread 1Thread 1
Thread 1
 
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
 
Effective java - concurrency
Effective java - concurrencyEffective java - concurrency
Effective java - concurrency
 
Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"
 
State of the .Net Performance
State of the .Net PerformanceState of the .Net Performance
State of the .Net Performance
 
posix.pdf
posix.pdfposix.pdf
posix.pdf
 
Threading
ThreadingThreading
Threading
 
Task and Data Parallelism
Task and Data ParallelismTask and Data Parallelism
Task and Data Parallelism
 
13multithreaded Programming
13multithreaded Programming13multithreaded Programming
13multithreaded Programming
 
Threads
ThreadsThreads
Threads
 
.NET Multithreading and File I/O
.NET Multithreading and File I/O.NET Multithreading and File I/O
.NET Multithreading and File I/O
 
Let's Talk Locks!
Let's Talk Locks!Let's Talk Locks!
Let's Talk Locks!
 
Java Performance Tweaks
Java Performance TweaksJava Performance Tweaks
Java Performance Tweaks
 
Multi Threading
Multi ThreadingMulti Threading
Multi Threading
 
Qt multi threads
Qt multi threadsQt multi threads
Qt multi threads
 
Threads in java, Multitasking and Multithreading
Threads in java, Multitasking and MultithreadingThreads in java, Multitasking and Multithreading
Threads in java, Multitasking and Multithreading
 
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attacDefcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
 
Java Multithreading
Java MultithreadingJava Multithreading
Java Multithreading
 

Mais de JAX London

Everything I know about software in spaghetti bolognese: managing complexity
Everything I know about software in spaghetti bolognese: managing complexityEverything I know about software in spaghetti bolognese: managing complexity
Everything I know about software in spaghetti bolognese: managing complexityJAX London
 
Devops with the S for Sharing - Patrick Debois
Devops with the S for Sharing - Patrick DeboisDevops with the S for Sharing - Patrick Debois
Devops with the S for Sharing - Patrick DeboisJAX London
 
Busy Developer's Guide to Windows 8 HTML/JavaScript Apps
Busy Developer's Guide to Windows 8 HTML/JavaScript AppsBusy Developer's Guide to Windows 8 HTML/JavaScript Apps
Busy Developer's Guide to Windows 8 HTML/JavaScript AppsJAX London
 
It's code but not as we know: Infrastructure as Code - Patrick Debois
It's code but not as we know: Infrastructure as Code - Patrick DeboisIt's code but not as we know: Infrastructure as Code - Patrick Debois
It's code but not as we know: Infrastructure as Code - Patrick DeboisJAX London
 
Locks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerLocks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerJAX London
 
Worse is better, for better or for worse - Kevlin Henney
Worse is better, for better or for worse - Kevlin HenneyWorse is better, for better or for worse - Kevlin Henney
Worse is better, for better or for worse - Kevlin HenneyJAX London
 
Java performance: What's the big deal? - Trisha Gee
Java performance: What's the big deal? - Trisha GeeJava performance: What's the big deal? - Trisha Gee
Java performance: What's the big deal? - Trisha GeeJAX London
 
Clojure made-simple - John Stevenson
Clojure made-simple - John StevensonClojure made-simple - John Stevenson
Clojure made-simple - John StevensonJAX London
 
HTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias Wessendorf
HTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias WessendorfHTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias Wessendorf
HTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias WessendorfJAX London
 
Play framework 2 : Peter Hilton
Play framework 2 : Peter HiltonPlay framework 2 : Peter Hilton
Play framework 2 : Peter HiltonJAX London
 
Complexity theory and software development : Tim Berglund
Complexity theory and software development : Tim BerglundComplexity theory and software development : Tim Berglund
Complexity theory and software development : Tim BerglundJAX London
 
Why FLOSS is a Java developer's best friend: Dave Gruber
Why FLOSS is a Java developer's best friend: Dave GruberWhy FLOSS is a Java developer's best friend: Dave Gruber
Why FLOSS is a Java developer's best friend: Dave GruberJAX London
 
Akka in Action: Heiko Seeburger
Akka in Action: Heiko SeeburgerAkka in Action: Heiko Seeburger
Akka in Action: Heiko SeeburgerJAX London
 
NoSQL Smackdown 2012 : Tim Berglund
NoSQL Smackdown 2012 : Tim BerglundNoSQL Smackdown 2012 : Tim Berglund
NoSQL Smackdown 2012 : Tim BerglundJAX London
 
Closures, the next "Big Thing" in Java: Russel Winder
Closures, the next "Big Thing" in Java: Russel WinderClosures, the next "Big Thing" in Java: Russel Winder
Closures, the next "Big Thing" in Java: Russel WinderJAX London
 
Mongo DB on the JVM - Brendan McAdams
Mongo DB on the JVM - Brendan McAdamsMongo DB on the JVM - Brendan McAdams
Mongo DB on the JVM - Brendan McAdamsJAX London
 
New opportunities for connected data - Ian Robinson
New opportunities for connected data - Ian RobinsonNew opportunities for connected data - Ian Robinson
New opportunities for connected data - Ian RobinsonJAX London
 
HTML5 Websockets and Java - Arun Gupta
HTML5 Websockets and Java - Arun GuptaHTML5 Websockets and Java - Arun Gupta
HTML5 Websockets and Java - Arun GuptaJAX London
 
The Big Data Con: Why Big Data is a Problem, not a Solution - Ian Plosker
The Big Data Con: Why Big Data is a Problem, not a Solution - Ian PloskerThe Big Data Con: Why Big Data is a Problem, not a Solution - Ian Plosker
The Big Data Con: Why Big Data is a Problem, not a Solution - Ian PloskerJAX London
 
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...JAX London
 

Mais de JAX London (20)

Everything I know about software in spaghetti bolognese: managing complexity
Everything I know about software in spaghetti bolognese: managing complexityEverything I know about software in spaghetti bolognese: managing complexity
Everything I know about software in spaghetti bolognese: managing complexity
 
Devops with the S for Sharing - Patrick Debois
Devops with the S for Sharing - Patrick DeboisDevops with the S for Sharing - Patrick Debois
Devops with the S for Sharing - Patrick Debois
 
Busy Developer's Guide to Windows 8 HTML/JavaScript Apps
Busy Developer's Guide to Windows 8 HTML/JavaScript AppsBusy Developer's Guide to Windows 8 HTML/JavaScript Apps
Busy Developer's Guide to Windows 8 HTML/JavaScript Apps
 
It's code but not as we know: Infrastructure as Code - Patrick Debois
It's code but not as we know: Infrastructure as Code - Patrick DeboisIt's code but not as we know: Infrastructure as Code - Patrick Debois
It's code but not as we know: Infrastructure as Code - Patrick Debois
 
Locks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerLocks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael Barker
 
Worse is better, for better or for worse - Kevlin Henney
Worse is better, for better or for worse - Kevlin HenneyWorse is better, for better or for worse - Kevlin Henney
Worse is better, for better or for worse - Kevlin Henney
 
Java performance: What's the big deal? - Trisha Gee
Java performance: What's the big deal? - Trisha GeeJava performance: What's the big deal? - Trisha Gee
Java performance: What's the big deal? - Trisha Gee
 
Clojure made-simple - John Stevenson
Clojure made-simple - John StevensonClojure made-simple - John Stevenson
Clojure made-simple - John Stevenson
 
HTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias Wessendorf
HTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias WessendorfHTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias Wessendorf
HTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias Wessendorf
 
Play framework 2 : Peter Hilton
Play framework 2 : Peter HiltonPlay framework 2 : Peter Hilton
Play framework 2 : Peter Hilton
 
Complexity theory and software development : Tim Berglund
Complexity theory and software development : Tim BerglundComplexity theory and software development : Tim Berglund
Complexity theory and software development : Tim Berglund
 
Why FLOSS is a Java developer's best friend: Dave Gruber
Why FLOSS is a Java developer's best friend: Dave GruberWhy FLOSS is a Java developer's best friend: Dave Gruber
Why FLOSS is a Java developer's best friend: Dave Gruber
 
Akka in Action: Heiko Seeburger
Akka in Action: Heiko SeeburgerAkka in Action: Heiko Seeburger
Akka in Action: Heiko Seeburger
 
NoSQL Smackdown 2012 : Tim Berglund
NoSQL Smackdown 2012 : Tim BerglundNoSQL Smackdown 2012 : Tim Berglund
NoSQL Smackdown 2012 : Tim Berglund
 
Closures, the next "Big Thing" in Java: Russel Winder
Closures, the next "Big Thing" in Java: Russel WinderClosures, the next "Big Thing" in Java: Russel Winder
Closures, the next "Big Thing" in Java: Russel Winder
 
Mongo DB on the JVM - Brendan McAdams
Mongo DB on the JVM - Brendan McAdamsMongo DB on the JVM - Brendan McAdams
Mongo DB on the JVM - Brendan McAdams
 
New opportunities for connected data - Ian Robinson
New opportunities for connected data - Ian RobinsonNew opportunities for connected data - Ian Robinson
New opportunities for connected data - Ian Robinson
 
HTML5 Websockets and Java - Arun Gupta
HTML5 Websockets and Java - Arun GuptaHTML5 Websockets and Java - Arun Gupta
HTML5 Websockets and Java - Arun Gupta
 
The Big Data Con: Why Big Data is a Problem, not a Solution - Ian Plosker
The Big Data Con: Why Big Data is a Problem, not a Solution - Ian PloskerThe Big Data Con: Why Big Data is a Problem, not a Solution - Ian Plosker
The Big Data Con: Why Big Data is a Problem, not a Solution - Ian Plosker
 
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...
 

Java and the machine - Martijn Verburg and Kirk Pepperdine

  • 1. Text Java and the Machine Kirk Pepperdine (@kcpeppe) Martijn Verburg (@karianna)
  • 2. This guy is coming for you
  • 3. The developer version of the T-800
  • 4. Hey Martijn, do you know what this is?
  • 6. Intel 4004 Shipped 1971 108khz-740khz 10µm die (1/10th of human hair) 16 pin DIP 2,300 transistors shared address and data bus 46300 or 92600 instructions / second Instruction cycle of 10.8 or 21.6 µs (8 clock cycles / instruction cycle) 4 bit bus (12 bit addressing, 8 bit instruction, 4 bit word) 46 instructions (41-8 bit, 5-16 bit), described in a 9 page doc!!!
  • 7. Intel i7 Shipped 2008 3.3Ghz (4200x) 2nm die (5000x smaller) 64 bit instructions 1370 landings 774 million transistors (~336,000x) 50000x less expensive, 350,000x faster, 5000x less power / transistor 31 stage instruction pipeline (instructions retired @ 1 / clock) • adjacent and nearly adjacent instructions run at the same time packaging doc is 102 pages MSR documentation is well over 7000 pages
  • 8. Lots of challenges! • Back to basics • Challenge: Hardware Threads • Challenge: Virtualisation • Challenge: The JVM • Challenge: Java • Conclusion
  • 9. It's 18:45 on conference day 1 You have a comforting beverage in your hand This talk is going to hurt T-800 don't care. 9
  • 10. Back to Basics • Moore's Law • Little's Law • Amdahl's Law • It's the hardware stupid! • What can I do?
  • 11. Moore's Law • It's about transistors not clocks
  • 13. Little's Law λ=1/µ Throughput = 1 / Service Time
  • 15. It's the Hardware stupid! • Software often thinks there are no limitations – If you're Turing complete you can do anything! • The reality is that its fighting over finite resources – You remember hardware right? • These all have finite capacities and throughputs – CPU – RAM – Disk – Network – Bus
  • 16. Mechanical Sympathy "Mechanical Sympathy - Hardware and Software working together" — Martin Thompson
  • 17. What can I do? • Don't panic • Learn the laws – Have them up next to your monitor • Code the laws – See them in action! • Amdahl's law says "don't serialise!" – With no serialisation, no need to worry abut Little's law! • Understand what hardware you have – Understand capacities and throughputs – Learn how to measure, not guess those numbers
  • 18. Still with us? Good. 17
  • 19. Hardware Threads • Threads are non deterministic • Single threaded programs • Multi core programs • What can I do? • Protect your code
  • 20. Threads are non-deterministic "Threads seem to be a small step from sequential computation" "In fact, they represent a huge step." — The Problem with Threads, Edward A. Lee, UC Berkeley, 2006
  • 21. In the beginning there was fork • It was a copy operation – No common memory space – So we had to work hard to share state between processes • Everything was local to us – We had no race conditions – Hardware caching was transparent, we didn't care – And we could cache anything at anytime without fear! • Life was simpler – Ordering was guaranteed
  • 22. In case you've forgotten
  • 23. In case you've forgotten
  • 24. Then we had multi-thread / single core • new Thread() is like a light-weight fork – Memory is now shared – You don't work hard to share state – Instead you work hard to guarantee order • Guaranteeing order is a much harder problem – We started with volatile and synchronized – They were fence builders, to help guarantee order – It made earlier processors do more work • Hardware, O/S & JVM guys were playing leap frog!
  • 25. Now we have multiple hardware threads • new Thread() is still a light-weight fork – Memory is still shared – Now we have to work hard to share state due to caching – We still have to work hard to guarantee order • Guaranteed order is now a performance hit – volatile and synchronized force draining and refreshing of caches – Some hardware support to get around this (CAS) – Some software support to get around this (NUMA) • Hardware, O/S & JVM guys are still playing leap frog!
  • 26. What can I do? "We protect code with the hope of protecting data." — Gil Tene, Azul Systems, 2008
  • 27. Protect your data • You can use a bunch of different techniques – synchronized – volatile – atomics, e.g. AtomicInteger – Explicit Java 5 Locks, e.g. ReentrantLock • But these all have a performance hit • Lets see examples of these in action
  • 28. This fence was atomic
  • 29. The following source code can be read later Don't worry there is a point to this 28
  • 30. Running Naked private int counter; public void execute( int threadCount) { init(); Thread[] threads = new Thread[ threadCount]; for ( int i = 0; i < threads.length; i++) { threads[i] = new Thread( new Runnable() { public void run() { long iterations = 0; try { while ( running) { counter++; counter--; iterations++; } } finally { totalIterations += iterations; } } }); } ...
  • 31. Using volatile private volatile int counter; public void execute( int threadCount) { init(); Thread[] threads = new Thread[ threadCount]; for ( int i = 0; i < threads.length; i++) { threads[i] = new Thread( new Runnable() { public void run() { long iterations = 0; try { while ( running) { counter++; counter--; iterations++; } } finally { totalIterations += iterations; } } }); } ...
  • 32. Using synchronized private int counter; private final Object lock = new Object(); public void execute( int threadCount) { init(); Thread[] threads = new Thread[ threadCount]; for ( int i = 0; i < threads.length; i++) { threads[i] = new Thread( new Runnable() { public void run() { long iterations = 0; try { while ( running) { synchronized (lock) { counter++; counter--; } iterations++; } } finally { totalIterations += iterations; } } }); } ...
  • 33. Using fully explicit Locks private int counter; private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock(); public void execute( int threadCount) { init(); Thread[] threads = new Thread[ threadCount]; for ( int i = 0; i < threads.length; i++) { threads[i] = new Thread( new Runnable() { public void run() { long iterations = 0; try { while ( running) { try { lock.writeLock.lock(); counter++; counter--; } finally { lock.writeLock.unlock(); } iterations++; } } finally { totalIterations += iterations; } ...
  • 34. Using AtomicInteger private AtomicInteger counter = new AtomicInteger(0); public void execute( int threadCount) { init(); Thread[] threads = new Thread[ threadCount]; for ( int i = 0; i < threads.length; i++) { threads[i] = new Thread( new Runnable() { public void run() { long iterations = 0; try { while ( running) { counter.getAndIncrement(); counter.getAndDecrement(); iterations++; } } finally { totalIterations += iterations; } ...
  • 35. A small benchmark comparison
  • 36. A small benchmark comparison
  • 37. So the point is that locking kills performance! 35
  • 38. Challenge: Virtualisation • Better utilisation of hardware? • You can't virtualise into more hardware • What can I do?
  • 39. Virtualisation "Why do we do it?" — Martijn Verburg and Kirk Pepperdine, JAX London 2012
  • 40. Better utilisation of hardware? • Could be utopia because we waste hardware? – Most hardware is idle – Load averages are far less than 10% on many systems • But why are our systems under utilised? – Often because we can't feed them (especially CPU) – Throughput in a system is often limited by a single resource • Trisha Gee is talking more about this tomorrow! – 14:25 - Java Performance FAQ
  • 41. T-800 is incapable of seeing your process "A Process is defined as a locus of control, an instruction sequence and an abstraction entity which moves through the instructions of a procedure, as the procedure is executed by a processor" — Jack B Dennis, Earl C Van Horn, 1965
  • 42. There's no such thing as a process as far as the hardware is concerned 40
  • 43. You can't virtualise into more hardware! • Throughput is often limited by a single resource – That bottleneck can starve everybody else! • Going from most problematic to most problematic – Network, Disk, RAM, CPU – Throughput overwhelms the wire – NAS means more pressure on network – CPU/RAM speed gap growing (8% per year) – Many thread scheduling issues (including cache!) – The list goes on and on and on.......... • Virtual machines sharing hardware causes resource conflict – You need more than core affinity
  • 44. What can I do? • Build your financial and capacity use cases – TCO for virtual vs bare metal – What does your growth curve look like? • CPU Pinning trick / Processor Affinity • Memory Pinning trick • Move back to bare metal! – It's OK - really!
  • 45. Final Thought "There are many reasons to move to the cloud - performance isn't necessarily one of them." — Kirk Pepperdine - random rant - 2010
  • 46. Challenge: The JVM • WORA • The cost of a strong memory model • GC scalability • GPU • File Systems • What can I do?
  • 47. Brian? You've got a lot of work to do 45
  • 48. WORA costs • CPU Model differences – When are where are you allowed to cache? – When and where are you allowed to reorder? – AMD vs Intel vs Sparc vs ARM vs ..... • File system differences – O/S Level support • Display Devices - Argh! – Impossible to keep up with the consumer trends – Aspect ratios, resolution, colour depth etc • Power – From unlimited to extremely limited
  • 49. The cost of the strong memory model • The JVM is ultra conservative – It rightly ensures correctness over performance • Locks enforce correctness – But in a very pessimistic way, which is expensive • Locks delineate regions of serialisation – Serialisation? – Uh oh! Remember Little's and Amdahl's laws?
  • 50. The visibility mismatch • Unit of visibility on a CPU != that in the JVM – CPU - Cache Line – JVM - Where the memory fences in the instruction pipeline are positioned • False sharing is an example of this visibility mismatch – volatile creates a fence which flushes the cache • We think that volatile is a modifier on a field – In reality it's a modifier on a cache line • It can make other fields accidentally volatile
  • 51. GC Scalability • GC cost is dominated by the # of live objects in heap – Larger Heap ~= more live objects • Mark identifies live objects – Takes longer with more objects • Sweep deals with live objects – Compaction only deals with live objects – Evacuation only deals with live objects • G1, Zing, and Balance are no different! – They still have to mark and sweep live objects – Mark for Zing is more efficient
  • 52. GPU • Can't utilise easily today – Java does not have normalised access to the GPU – The GPU does not have access to Java heap • Today - third party libraries – That translate byte code to GPU RISC instructions – Bundle the data and the instructions together – Then push data and instructions through the GPU • Project Sumatra on its way – http://openjdk.java.net/projects/sumatra/ – Aparapi backs onto OpenCL • CPU and GPU memory to converge?
  • 53. What can I do? • Not using java.util.concurrent? – You're probably doing it wrong • Avoid locks where possible! – They make it difficult to run processes concurrently – But don't forget integrity! • Use Unsafe to implement lockless concurrency – Dangerous!! – Allows non-blocking / wait free implementations – Gives us access to the pointer system – Gives us access to the CAS instruction • Cliff Click's awesome lib – http://sourceforge.net/projects/high-scale-lib/
  • 54. What can I do? • Workaround GC with SLAB allocators – Hiding the objects from the GC in native memory – It's a slippery slope to writing your own GC • We're looking at you Cassandra! • Workaround GC using near cache – Storing the objects in another JVM • Peter Lawrey's thread affinity OSS project – https://github.com/peter-lawrey/Java-Thread-Affinity • Join Adopt OpenJDK – http://www.java.net/projects/adoptopenjdk
  • 55. Challenge: Java • java.lang.Thread is low level • File systems • What can I do?
  • 56. java.lang.Thread is low level • Has low level co-ordination – wait() – notify(), notifyAll() • Also has dangerous operations – stop() unlocks all the monitors - leaving data in an inconsistent state • java.util.concurrent helps – But managing your threads is still manual
  • 57. File Systems • It was difficult to do asynchronous I/O – You would get stalled on large file interaction – You would get stalled on multi-casting • There was no native file system support – NIO bought in pipe and selector support – NIO.2 bought in symbolic links support
  • 58. What can I do? • Read Brian's book – Java Concurrency in Practice • Move to java.util.concurrent – People smarter than us have thought about this – Use Runnables, Callables and ExecutorServices • Use thread safe collections – ConcurrentHashMap – CopyOnWriteArrayList • Move to Java 7 – NIO.2 gives you asynchronous I/O ! – Fork and Join helps with Amdahls law
  • 59. Conclusion • Learn about hardware again – Check your capacity and throughput • Virtualise with caution – It can work for and against you • The JVM needs to evolve and so do you – OpenJDK is adapting, hopefully fast enough! • Learn to use parallelism in code – Challenge each other in peer reviews!
  • 62. Acknowledgements • The jClarity Team • Gil Tene - Azul Systems • Warner Bros pictures • Lionsgate Films • Dan Hardiker - Adaptivist • Trisha Gee - 10gen • James Gough - Stackthread
  • 63. Enjoy the community night! Kirk Pepperdine (@kcpeppe) Martijn Verburg (@karianna) www.jclarity.com

Notas do Editor

  1. This is who we are, this is what we do, Google the rest\n
  2. Kirk after too much sun in Crete\n
  3. i7-3770\n
  4. Intel 4004, the first commercial microprocessor (1971)\n
  5. Intel 4004, the first commercial microprocessor (1971)\n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. * Don&apos;t know what this is? Get out.\n
  12. Talk about critical sections / points of execution serialisation\n
  13. Gene Amdahl, and is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in parallel computing to predict the theoretical maximum speedup using multiple processors.\n It was presented at the AFIPS Spring Joint Computer Conference in 1967.\n
  14. Moore&apos;s law is about the capacity of the CPU\nClock speed is about throughput\nFinite amount of space on the edge of the chip\nFixed width to the bus\n
  15. \n
  16. If you have one server per customer in a perfect parallel world...\nTalk to your sys admins / devops folks, buy more hardware ;-)\nUtilisation is an aggregation - look at the system as a whole\n
  17. The calculus and assembly programming are in the next sections\n\n
  18. &lt;TODO: Is it really Hardware threads?&gt;\nHardware thread core/hyperthreaded core\n
  19. Thread as an abstraction helps you deal with sequential computation, the problem lies when you pluralize this.\n
  20. copy ~= clone\n
  21. \n
  22. \n
  23. Even when there are multiple threads there is very little chance of corruption as forks sharing a hardware thread all accessed the same cache\n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. Here&apos;s your takeaway point - locks are f%^&amp;in slow\n
  35. Here&apos;s your takeaway point - locks are f%^&amp;in slow\n
  36. \n
  37. &lt;TODO: Maybe kill horizontal scalibilty&gt;\n* hypervisor, also called virtual machine manager (VMM) - e.g. KVM, or Xen.\n* Containers, like chroot on steroids, so change what you think is root and partition off a chunk of the Linux O/S from there, e.g. LXC\n\n
  38. Kirk --&gt; Martijn --&gt; Ask the audience!\nAdministration and Business data centre reasoning\n
  39. \n
  40. * A process is an artificial abstraction that we use\n\n
  41. And there&apos;s no such thing as a thread either\n
  42. * Early cloud had network bottlenecks (think dial up modems!)\n* NAS is a problem (S3 for example) - turns a disk problem into a network problem\nHypervisors interact with the scheduler (look at the linux kernel work recently)\n\n\n
  43. &lt;TODO Keep slide hidden?&gt;The cost savings aren&apos;t always better\n
  44. * Bosses talk $&amp;#xA3;, not 1&apos;s and 0&apos;s\n
  45. \n\n
  46. It&apos;s early 1990&apos;s tech that&apos;s in maintenance\n
  47. \n
  48. \n
  49. Instruction fetch\nInstruction decode and register fetch\nExecute\nMemory access\nRegister write back\n
  50. Unlike Hypervisors, which can spin up a thread on a single core\n\n
  51. JVM&apos;s responsibility is to execute byte code, CPU to execute instruction pipelines\n
  52. Talk to concurrent GC and STW\nlive == pinned, so leaked memory is live memory by definition\n
  53. * This is an extra resource, but it&apos;s a specialised one for you number crunchers\n
  54. \n
  55. Utilise pseudo thread affinity\n NUMA might/can/does do that\n But perhaps a developer could do better?\n Do we need a CPULocal?\n\n
  56. &lt;TODO weak section&gt;\n
  57. \n
  58. \n
  59. Lambdas are a neat way on encapsulating behaviour and data that you will like want to run in a parallel fashion in the future....\n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n