SlideShare uma empresa Scribd logo
1 de 83
Baixar para ler offline
Lock-Free Programming: Pro Tips
Jean-Philippe BEMPEL @jpbempel
Performance Architect http://jpbempel.blogspot.com
© ULLINK 2015
Agenda
© ULLINK 2015 2
• Measuring Contention
• Lock Striping
• Compare-And-Swap
• Introduction to Java Memory Model
• Disruptor & RingBuffer
• Spinning
• Ticketing: OrderedScheduler
© ULLINK 2015 3
Immutability
© ULLINK 2015 4
Contention
Contention
© ULLINK 2015 5
• Two or more thread competing
to acquire a lock
• Thread parked when waiting for
a lock
• Number one reason we want to
avoid lock
© ULLINK 2015 6
Measure, don’t guess!
Kirk Pepperdine & Jack Shirazi
© ULLINK 2015 7
Measure, don’t premature!
Measuring Contention
© ULLINK 2015 8
Synchronized blocks:
• Profilers (YourKit, JProfiler, ZVision)
• JVMTI native agent
• Results may be difficult to exploit
Measuring Contention: JProfiler
© ULLINK 2015 9
•
Measuring Contention: ZVision
December 16, 2014© ULLINK 2014 – Private & Confidential 10
•
Measuring Contention
© ULLINK 2015 11
java.util.concurrent.Lock:
• JVM cannot helps us here
• JDK classes (lib), regular code
• JProfiler can measure them
• j.u.c classes modification + bootclasspath (jucprofiler)
Measuring Contention: JProfiler
© ULLINK 2015 12
•
Measuring Contention
© ULLINK 2015 13
• Insertion of contention counters
• Identify place where lock fail to be acquired
• increment counter
• Identify Locks
• Call stacks at construction
• Logging counter status
• How to measure existing locks in your code
• Modify JDK classes
• Reintroduce in bootclasspath
© ULLINK 2015 14
Lock striping
Lock striping
© ULLINK 2015 15
• Reduce contention by distributing it
• Not remove locks, instead adding more
• Good partitioning is key to be effective (like HashMap)
Lock striping
© ULLINK 2015 16
Best example in JDK: ConcurrentHashMap
Lock striping
© ULLINK 2015 17
• Relatively easy to implement
• Can be very effective as long as good partitioning
• Can be tuned (number of partition) regarding the
contention/concurrency level
© ULLINK 2015 18
Compare-And-Swap
Compare-And-Swap
© ULLINK 2015 19
• Basic primitive for any lock-free algorithm
• Used to implement any locks or synchronization
primitives
• Handled directly by the CPU (instructions)
Compare-And-Swap
© ULLINK 2015 20
• Update atomically a memory location by another value if
the previous value is the expected one
• instruction with 3 arguments:
• memory address (rbx)
• expected value (rax)
• new value (rcx)
movabs rax,0x2a
movabs rcx,0x2b
lock cmpxchg QWORD PTR [rbx],rcx
Compare-And-Swap
© ULLINK 2015 21
• In Java for AtomicXXX classes:
boolean compareAndSet(long expect, long update)
• Memory address is the internal value field of the class
Compare-And-Swap
© ULLINK 2015 22
• Atomic increment with CAS
[JDK7] getAndIncrement():
while (true) {
long current = get();
long next = current + 1;
if (compareAndSet(current, next))
return current;
}
[JDK8] getAndIncrement():
return unsafe.getAndAddLong(this, valueOffset, 1L);
intrinsified to:
movabs rsi,0x1
lock xadd QWORD PTR [rdx+0x10],rsi
Compare-And-Swap: AtomicLong
© ULLINK 2015 23
ReentrantLock is implemented with a CAS:
volatile int state;
lock()
compareAndSet(0, 1);
if CAS fails => lock already acquired
unlock()
setState(0)
Compare-And-Swap: Lock implementation
© ULLINK 2015 24
• Simplest lock-free algorithm
• Use CAS to update the next pointer into a linked list
• if CAS fails, means concurrent update happened
• Read new value, go to next item and retry CAS
Compare-And-Swap: ConcurrentLinkedQueue
© ULLINK 2015 25
Compare-And-Swap: ConcurrentLinkedQueue
© ULLINK 2015 26
Compare-And-Swap: ConcurrentLinkedQueue
© ULLINK 2015 27
Compare-And-Swap: ConcurrentLinkedQueue
© ULLINK 2015 28
Compare-And-Swap: ConcurrentLinkedQueue
© ULLINK 2015 29
© ULLINK 2015 30
Java Memory Model
(introduction)
• First language having a well defined memory model:
Java JDK 5 (2004) with JSR 133
• C++ get a standard Memory Model in 2011 (C++11)
• Before that, some constructions may have
undefined/different behavior on different platform
(Double Check Locking)
Memory Model
© ULLINK 2015 31
int a;
int b;
boolean enabled;
{ {
a = 21; enabled = true;
b = a * 2; a = 21;
enabled = true; b = a * 2;
} }
Memory ordering
© ULLINK 2015 32
JIT Compiler
int a;
int b;
boolean enabled;
Thread 1 Thread 2
{ {
a = 21; if (enabled)
b = a * 2; {
enabled = true; int answer = b
} process(answer);
}
}
Memory ordering
© ULLINK 2015 33
int a;
int b;
volatile boolean enabled;
Thread 1 Thread 2
{ {
a = 21; if (enabled)
b = a * 2; {
enabled = true; int answer = b
} process(answer);
}
}
Memory ordering
© ULLINK 2015 34
Memory barriers
© ULLINK 2015 35
• Can be at 2 levels: Compiler & Hardware
• Depending on CPU architecture, barrier is not required
• on x86: Strong model, limited reordering
Memory barriers
© ULLINK 2015 36
Memory barriers: volatile
© ULLINK 2015 37
• volatile field implies memory barrier
• Compiler barrier: prevent reordering
• Hardware barrier: Ensure drain of the memory buffers
• on X86, only store barrier emits an hardware one
lock add DWORD PTR [rsp],0x0
Memory barriers: CAS
© ULLINK 2015 38
• CAS is also a memory barrier
• Compiler: recognized by JIT to prevent reordering
• Hardware: all lock instructions is a memory barrier
Memory barriers: synchronized
© ULLINK 2015 39
• Synchronized blocks have implicit memory barriers
• Entering block: Load memory barrier
• Exiting block: store memory barrier
Memory barriers: synchronized
© ULLINK 2015 40
synchronized (this)
{
enabled = true;
b = 21;
a = b * 2;
}
Memory barriers: lazySet
© ULLINK 2015 41
• method from AtomicXXX classes
• Compiler only memory barrier
• Does not emit hardware store barrier
• Still guarantee non reordering (most important)
but not immediate effect for other thread
© ULLINK 2015 42
Disruptor & Ring Buffer
Disruptor
© ULLINK 2015 43
• LMAX library (incl. Martin Thompson)
• Not a new idea, circular buffers in Linux Kernel, Lamport
• Ported to Java
Disruptor
© ULLINK 2015 44
Why not used CLQ which is lock(wait)-free?
• Queue unbounded et non blocking
• Allocate a node at each insertion
• Not CPU cache friendly
• MultiProducer and MultiConsumer
Array/LinkedBlockingQueue: Not lock-free
Ring Buffer: 1P 1C
© ULLINK 2015 45
Ring Buffer: 1P 1C
© ULLINK 2015 46
Ring Buffer: 1P 1C
© ULLINK 2015 47
Object[] ringBuffer;
volatile int head;
volatile int tail;
public boolean offer(E e) {
if (tail - head == ringBuffer.length)
return false;
ringBuffer[tail % ringBuffer.length] = e;
tail++; // volatile write
return true;
}
Ring Buffer: 1P 1C
© ULLINK 2015 48
public E poll() {
if (tail == head)
return null;
int idx = head % ringBuffer.length
E element = ringBuffer[idx];
ringBuffer[idx] = null;
head++; // volatile write
return element;
}
Ring Buffer: 1P 1C
© ULLINK 2015 49
Ring Buffer: nP 1C
© ULLINK 2015 50
Ring Buffer: nP 1C
© ULLINK 2015 51
Ring Buffer: nP 1C
© ULLINK 2015 52
Ring Buffer: nP 1C
© ULLINK 2015 53
AtomicReferenceArray ringBuffer;
volatile long head;
AtomicLong tail;
Ring Buffer: nP 1C
© ULLINK 2015 54
public boolean offer(E e) {
long curTail;
do {
curTail = tail.get();
if (curTail - head == ringBuffer.length())
return false;
} while (!tail.compareAndSet(curTail, curTail+1));
int idx = curTail % ringBuffer.length();
ringBuffer.set(idx, e); // volatile write
return true;
}
Ring Buffer: nP 1C
© ULLINK 2015 55
public E poll() {
int index = head % ringBuffer.length();
E element = ringBuffer.get(index);
if (element == null)
return null;
ringBuffer.set(index, null);
head++; // volatile write
return element;
}
Ring Buffer: nP 1C
© ULLINK 2015 56
Disruptor
© ULLINK 2015 57
• Very flexible for different usages (strategies)
• Very good performance
• Data transfer from one thread to another (Queue)
© ULLINK 2015 58
Spinning
spinning
© ULLINK 2015 59
• Active wait
• very good for consumer reactivity
• Burns a cpu permanently
spinning
© ULLINK 2015 60
• Some locks are implemented with spinning (spinLock)
• Synchronized blocks spin a little bit on contention
• use of the pause instruction (x86)
spinning
© ULLINK 2015 61
How to avoid burning a core ?
Backoff strategies:
• Thread.yield()
• LockSupport.parkNanos(1)
• Object.wait()/Condition.await()/LockSupport.park()
© ULLINK 2015 62
Ticketing:
OrderedScheduler
Ticketing
© ULLINK 2015 63
How to parallelize tasks while keeping ordering ?
Example: Video stream processing
• read frame from the stream
• processing of the frame (parallelisable)
• writing into the output (in order)
Ticketing
© ULLINK 2015 64
Ticketing
© ULLINK 2015 65
Ticketing
© ULLINK 2015 66
Can do this with Disruptor, but with a consumer thread
OrderedScheduler can do the same but:
• no inter-thread communication overhead
• no additional thread
• no wait strategy
Take a ticket...
OrderedScheduler
© ULLINK 2015 67
OrderedScheduler
© ULLINK 2015 68
OrderedScheduler
© ULLINK 2015 69
OrderedScheduler
© ULLINK 2015 70
OrderedScheduler
© ULLINK 2015 71
OrderedScheduler
© ULLINK 2015 72
OrderedScheduler
© ULLINK 2015 73
OrderedScheduler
© ULLINK 2015 74
public void execute() {
synchronized (this) {
FooInput input = read();
BarOutput output = process(input);
write(output);
}
}
OrderedScheduler
© ULLINK 2015 75
OrderedScheduler scheduler = new OrderedScheduler();
public void execute() {
FooInput input;
long ticket;
synchronized (this) {
input = read();
ticket = scheduler.getNextTicket();
}
[...]
OrderedScheduler
© ULLINK 2015 76
public void execute(){
[...]
BarOutput output;
try {
output = process(intput);
}
catch (Exception ex) {
scheduler.trash(ticket);
throw new RuntimeException(ex);
}
scheduler.run(ticket, { () => write(output); });
}
Ticketing
© ULLINK 2015 77
• Open sourced on GitHub
• Opened to PR & discussion on the design
• Used internally
© ULLINK 2015 78
Takeaways
Takeaways
© ULLINK 2015 79
• Measure
• Distribute
• Atomically update
• Order
• RingBuffer: easy lock-free
• Ordered without consumer thread
References
© ULLINK 2015 80
• jucProfiler: http://www.infoq.com/articles/jucprofiler
• Java Memory Model Pragmatics: http://shipilev.net/blog/2014/jmm-
pragmatics/
• Memory Barriers and JVM Concurrency: http://www.infoq.
com/articles/memory_barriers_jvm_concurrency
• JSR 133 (FAQ): http://www.cs.umd.
edu/~pugh/java/memoryModel/jsr-133-faq.html
• CPU cache flushing fallacy: http://mechanical-sympathy.blogspot.
fr/2013/02/cpu-cache-flushing-fallacy.html
• atomic<> Weapons: http://channel9.msdn.
com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-
atomic-Weapons-1-of-2
References
© ULLINK 2015 81
• circular buffers: http://lwn.net/Articles/378262/
• Mr T queues: https://github.
com/mjpt777/examples/tree/master/src/java/uk/co/real_l
ogic/queues
• Lock-free algorithms by Mr T: http://www.infoq.
com/presentations/Lock-free-Algorithms
• Futex are tricky U. Drepper: http://www.akkadia.
org/drepper/futex.pdf
• JCTools: https://github.com/JCTools/JCTools
• Nitsan Wakart blog: http://psy-lob-saw.blogspot.com
References
© ULLINK 2015 82
• Exploiting data parallelism in ordered data streams:
https://software.intel.com/en-us/articles/exploiting-data-
parallelism-in-ordered-data-streams
• OrderedScheduler: https://github.com/Ullink/ordered-
scheduler
• Concurrency Freaks: http://concurrencyfreaks.blogspot.
com
• Preshing on programming: http://preshing.com/
• Is Parallel Programming Hard, And If So, What Can You
Do About It? https://kernel.
org/pub/linux/kernel/people/paulmck/perfbook/perfbook.
2015.01.31a.pdf
ABOUT ULLINK
ULLINK’s electronic financial trading
software solutions and services
address the challenges of new
regulations and increased
fragmentation. Emerging markets
require fast, flexible and compliant
solutions for buy-side and sell-side
market participants; providing a
competitive advantage for both low
touch and high touch trading.
www.ullink.com
FIND OUT MORE
Contact our Sales Team to learn more about the services listed
herein as well as our full array of offerings:
+81 3 3664 4160 (Japan)
+852 2521 5400 (Asia Pacific)
+1 212 991 0816 (North America)
+55 11 3171 2409 (Latin America)
+44 20 7488 1655 (UK and EMEA)
connect@ullink.com
© ULLINK 2015 83

Mais conteúdo relacionado

Mais procurados

BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...
BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...
BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...
BlueHat Security Conference
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
Wei Shan Ang
 

Mais procurados (20)

Don't dump thread dumps
Don't dump thread dumpsDon't dump thread dumps
Don't dump thread dumps
 
Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1
 
Oss4b - pxc introduction
Oss4b   - pxc introductionOss4b   - pxc introduction
Oss4b - pxc introduction
 
Find bottleneck and tuning in Java Application
Find bottleneck and tuning in Java ApplicationFind bottleneck and tuning in Java Application
Find bottleneck and tuning in Java Application
 
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesZero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best Practices
 
BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...
BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...
BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...
 
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
 
Percona XtraDB Cluster
Percona XtraDB ClusterPercona XtraDB Cluster
Percona XtraDB Cluster
 
Introducing Galera 3.0
Introducing Galera 3.0Introducing Galera 3.0
Introducing Galera 3.0
 
Le guide de dépannage de la jvm
Le guide de dépannage de la jvmLe guide de dépannage de la jvm
Le guide de dépannage de la jvm
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Cluster
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
 
What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...
 
Streaming huge databases using logical decoding
Streaming huge databases using logical decodingStreaming huge databases using logical decoding
Streaming huge databases using logical decoding
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
 
Percona XtraDB Cluster - Small Presentation
Percona XtraDB Cluster - Small PresentationPercona XtraDB Cluster - Small Presentation
Percona XtraDB Cluster - Small Presentation
 
How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Q4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresQ4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad cores
 

Semelhante a Lock free programming- pro tips

NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus
Hirofumi Ichihara
 
Verilog for synthesis - combinational rev a.pdf
Verilog for synthesis - combinational rev a.pdfVerilog for synthesis - combinational rev a.pdf
Verilog for synthesis - combinational rev a.pdf
AzeemMohammedAbdul
 
NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5
UniFabric
 

Semelhante a Lock free programming- pro tips (20)

Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...
Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...
Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 
Neutron CI Run on Docker
Neutron CI Run on DockerNeutron CI Run on Docker
Neutron CI Run on Docker
 
Synchronization problem with threads
Synchronization problem with threadsSynchronization problem with threads
Synchronization problem with threads
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
 
NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
 
Verilog for synthesis - combinational rev a.pdf
Verilog for synthesis - combinational rev a.pdfVerilog for synthesis - combinational rev a.pdf
Verilog for synthesis - combinational rev a.pdf
 
Real time Linux
Real time LinuxReal time Linux
Real time Linux
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
MODULE 3 process synchronizationnnn.pptx
MODULE 3 process synchronizationnnn.pptxMODULE 3 process synchronizationnnn.pptx
MODULE 3 process synchronizationnnn.pptx
 
LCA13: Common Clk Framework DVFS Roadmap
LCA13: Common Clk Framework DVFS RoadmapLCA13: Common Clk Framework DVFS Roadmap
LCA13: Common Clk Framework DVFS Roadmap
 
Stress-SGX: Load and Stress your Enclaves for Fun and Profit
Stress-SGX: Load and Stress your Enclaves for Fun and ProfitStress-SGX: Load and Stress your Enclaves for Fun and Profit
Stress-SGX: Load and Stress your Enclaves for Fun and Profit
 
NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2
 
Google QUIC
Google QUICGoogle QUIC
Google QUIC
 
Kernel Recipes 2015 - So you want to write a Linux driver framework
Kernel Recipes 2015 - So you want to write a Linux driver frameworkKernel Recipes 2015 - So you want to write a Linux driver framework
Kernel Recipes 2015 - So you want to write a Linux driver framework
 
GC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance EngineerGC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance Engineer
 

Mais de Jean-Philippe BEMPEL

Mais de Jean-Philippe BEMPEL (16)

Mastering GC.pdf
Mastering GC.pdfMastering GC.pdf
Mastering GC.pdf
 
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
 
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
 
Tools in action jdk mission control and flight recorder
Tools in action  jdk mission control and flight recorderTools in action  jdk mission control and flight recorder
Tools in action jdk mission control and flight recorder
 
Understanding JVM GC: advanced!
Understanding JVM GC: advanced!Understanding JVM GC: advanced!
Understanding JVM GC: advanced!
 
Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2
 
Understanding low latency jvm gcs
Understanding low latency jvm gcsUnderstanding low latency jvm gcs
Understanding low latency jvm gcs
 
Understanding jvm gc advanced
Understanding jvm gc advancedUnderstanding jvm gc advanced
Understanding jvm gc advanced
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differences
 
Out ofmemoryerror what is the cost of java objects
Out ofmemoryerror  what is the cost of java objectsOut ofmemoryerror  what is the cost of java objects
Out ofmemoryerror what is the cost of java objects
 
OutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en javaOutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en java
 
Lock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx ukLock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx uk
 
Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)
 
Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)
 
Measuring directly from cpu hardware performance counters
Measuring directly from cpu  hardware performance countersMeasuring directly from cpu  hardware performance counters
Measuring directly from cpu hardware performance counters
 
Devoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perfDevoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perf
 

Último

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 

Último (20)

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 

Lock free programming- pro tips

  • 1. Lock-Free Programming: Pro Tips Jean-Philippe BEMPEL @jpbempel Performance Architect http://jpbempel.blogspot.com © ULLINK 2015
  • 2. Agenda © ULLINK 2015 2 • Measuring Contention • Lock Striping • Compare-And-Swap • Introduction to Java Memory Model • Disruptor & RingBuffer • Spinning • Ticketing: OrderedScheduler
  • 3. © ULLINK 2015 3 Immutability
  • 4. © ULLINK 2015 4 Contention
  • 5. Contention © ULLINK 2015 5 • Two or more thread competing to acquire a lock • Thread parked when waiting for a lock • Number one reason we want to avoid lock
  • 6. © ULLINK 2015 6 Measure, don’t guess! Kirk Pepperdine & Jack Shirazi
  • 7. © ULLINK 2015 7 Measure, don’t premature!
  • 8. Measuring Contention © ULLINK 2015 8 Synchronized blocks: • Profilers (YourKit, JProfiler, ZVision) • JVMTI native agent • Results may be difficult to exploit
  • 10. Measuring Contention: ZVision December 16, 2014© ULLINK 2014 – Private & Confidential 10 •
  • 11. Measuring Contention © ULLINK 2015 11 java.util.concurrent.Lock: • JVM cannot helps us here • JDK classes (lib), regular code • JProfiler can measure them • j.u.c classes modification + bootclasspath (jucprofiler)
  • 13. Measuring Contention © ULLINK 2015 13 • Insertion of contention counters • Identify place where lock fail to be acquired • increment counter • Identify Locks • Call stacks at construction • Logging counter status • How to measure existing locks in your code • Modify JDK classes • Reintroduce in bootclasspath
  • 14. © ULLINK 2015 14 Lock striping
  • 15. Lock striping © ULLINK 2015 15 • Reduce contention by distributing it • Not remove locks, instead adding more • Good partitioning is key to be effective (like HashMap)
  • 16. Lock striping © ULLINK 2015 16 Best example in JDK: ConcurrentHashMap
  • 17. Lock striping © ULLINK 2015 17 • Relatively easy to implement • Can be very effective as long as good partitioning • Can be tuned (number of partition) regarding the contention/concurrency level
  • 18. © ULLINK 2015 18 Compare-And-Swap
  • 19. Compare-And-Swap © ULLINK 2015 19 • Basic primitive for any lock-free algorithm • Used to implement any locks or synchronization primitives • Handled directly by the CPU (instructions)
  • 20. Compare-And-Swap © ULLINK 2015 20 • Update atomically a memory location by another value if the previous value is the expected one • instruction with 3 arguments: • memory address (rbx) • expected value (rax) • new value (rcx) movabs rax,0x2a movabs rcx,0x2b lock cmpxchg QWORD PTR [rbx],rcx
  • 22. • In Java for AtomicXXX classes: boolean compareAndSet(long expect, long update) • Memory address is the internal value field of the class Compare-And-Swap © ULLINK 2015 22
  • 23. • Atomic increment with CAS [JDK7] getAndIncrement(): while (true) { long current = get(); long next = current + 1; if (compareAndSet(current, next)) return current; } [JDK8] getAndIncrement(): return unsafe.getAndAddLong(this, valueOffset, 1L); intrinsified to: movabs rsi,0x1 lock xadd QWORD PTR [rdx+0x10],rsi Compare-And-Swap: AtomicLong © ULLINK 2015 23
  • 24. ReentrantLock is implemented with a CAS: volatile int state; lock() compareAndSet(0, 1); if CAS fails => lock already acquired unlock() setState(0) Compare-And-Swap: Lock implementation © ULLINK 2015 24
  • 25. • Simplest lock-free algorithm • Use CAS to update the next pointer into a linked list • if CAS fails, means concurrent update happened • Read new value, go to next item and retry CAS Compare-And-Swap: ConcurrentLinkedQueue © ULLINK 2015 25
  • 30. © ULLINK 2015 30 Java Memory Model (introduction)
  • 31. • First language having a well defined memory model: Java JDK 5 (2004) with JSR 133 • C++ get a standard Memory Model in 2011 (C++11) • Before that, some constructions may have undefined/different behavior on different platform (Double Check Locking) Memory Model © ULLINK 2015 31
  • 32. int a; int b; boolean enabled; { { a = 21; enabled = true; b = a * 2; a = 21; enabled = true; b = a * 2; } } Memory ordering © ULLINK 2015 32 JIT Compiler
  • 33. int a; int b; boolean enabled; Thread 1 Thread 2 { { a = 21; if (enabled) b = a * 2; { enabled = true; int answer = b } process(answer); } } Memory ordering © ULLINK 2015 33
  • 34. int a; int b; volatile boolean enabled; Thread 1 Thread 2 { { a = 21; if (enabled) b = a * 2; { enabled = true; int answer = b } process(answer); } } Memory ordering © ULLINK 2015 34
  • 35. Memory barriers © ULLINK 2015 35 • Can be at 2 levels: Compiler & Hardware • Depending on CPU architecture, barrier is not required • on x86: Strong model, limited reordering
  • 37. Memory barriers: volatile © ULLINK 2015 37 • volatile field implies memory barrier • Compiler barrier: prevent reordering • Hardware barrier: Ensure drain of the memory buffers • on X86, only store barrier emits an hardware one lock add DWORD PTR [rsp],0x0
  • 38. Memory barriers: CAS © ULLINK 2015 38 • CAS is also a memory barrier • Compiler: recognized by JIT to prevent reordering • Hardware: all lock instructions is a memory barrier
  • 39. Memory barriers: synchronized © ULLINK 2015 39 • Synchronized blocks have implicit memory barriers • Entering block: Load memory barrier • Exiting block: store memory barrier
  • 40. Memory barriers: synchronized © ULLINK 2015 40 synchronized (this) { enabled = true; b = 21; a = b * 2; }
  • 41. Memory barriers: lazySet © ULLINK 2015 41 • method from AtomicXXX classes • Compiler only memory barrier • Does not emit hardware store barrier • Still guarantee non reordering (most important) but not immediate effect for other thread
  • 42. © ULLINK 2015 42 Disruptor & Ring Buffer
  • 43. Disruptor © ULLINK 2015 43 • LMAX library (incl. Martin Thompson) • Not a new idea, circular buffers in Linux Kernel, Lamport • Ported to Java
  • 44. Disruptor © ULLINK 2015 44 Why not used CLQ which is lock(wait)-free? • Queue unbounded et non blocking • Allocate a node at each insertion • Not CPU cache friendly • MultiProducer and MultiConsumer Array/LinkedBlockingQueue: Not lock-free
  • 45. Ring Buffer: 1P 1C © ULLINK 2015 45
  • 46. Ring Buffer: 1P 1C © ULLINK 2015 46
  • 47. Ring Buffer: 1P 1C © ULLINK 2015 47
  • 48. Object[] ringBuffer; volatile int head; volatile int tail; public boolean offer(E e) { if (tail - head == ringBuffer.length) return false; ringBuffer[tail % ringBuffer.length] = e; tail++; // volatile write return true; } Ring Buffer: 1P 1C © ULLINK 2015 48
  • 49. public E poll() { if (tail == head) return null; int idx = head % ringBuffer.length E element = ringBuffer[idx]; ringBuffer[idx] = null; head++; // volatile write return element; } Ring Buffer: 1P 1C © ULLINK 2015 49
  • 50. Ring Buffer: nP 1C © ULLINK 2015 50
  • 51. Ring Buffer: nP 1C © ULLINK 2015 51
  • 52. Ring Buffer: nP 1C © ULLINK 2015 52
  • 53. Ring Buffer: nP 1C © ULLINK 2015 53
  • 54. AtomicReferenceArray ringBuffer; volatile long head; AtomicLong tail; Ring Buffer: nP 1C © ULLINK 2015 54
  • 55. public boolean offer(E e) { long curTail; do { curTail = tail.get(); if (curTail - head == ringBuffer.length()) return false; } while (!tail.compareAndSet(curTail, curTail+1)); int idx = curTail % ringBuffer.length(); ringBuffer.set(idx, e); // volatile write return true; } Ring Buffer: nP 1C © ULLINK 2015 55
  • 56. public E poll() { int index = head % ringBuffer.length(); E element = ringBuffer.get(index); if (element == null) return null; ringBuffer.set(index, null); head++; // volatile write return element; } Ring Buffer: nP 1C © ULLINK 2015 56
  • 57. Disruptor © ULLINK 2015 57 • Very flexible for different usages (strategies) • Very good performance • Data transfer from one thread to another (Queue)
  • 58. © ULLINK 2015 58 Spinning
  • 59. spinning © ULLINK 2015 59 • Active wait • very good for consumer reactivity • Burns a cpu permanently
  • 60. spinning © ULLINK 2015 60 • Some locks are implemented with spinning (spinLock) • Synchronized blocks spin a little bit on contention • use of the pause instruction (x86)
  • 61. spinning © ULLINK 2015 61 How to avoid burning a core ? Backoff strategies: • Thread.yield() • LockSupport.parkNanos(1) • Object.wait()/Condition.await()/LockSupport.park()
  • 62. © ULLINK 2015 62 Ticketing: OrderedScheduler
  • 63. Ticketing © ULLINK 2015 63 How to parallelize tasks while keeping ordering ? Example: Video stream processing • read frame from the stream • processing of the frame (parallelisable) • writing into the output (in order)
  • 66. Ticketing © ULLINK 2015 66 Can do this with Disruptor, but with a consumer thread OrderedScheduler can do the same but: • no inter-thread communication overhead • no additional thread • no wait strategy Take a ticket...
  • 74. OrderedScheduler © ULLINK 2015 74 public void execute() { synchronized (this) { FooInput input = read(); BarOutput output = process(input); write(output); } }
  • 75. OrderedScheduler © ULLINK 2015 75 OrderedScheduler scheduler = new OrderedScheduler(); public void execute() { FooInput input; long ticket; synchronized (this) { input = read(); ticket = scheduler.getNextTicket(); } [...]
  • 76. OrderedScheduler © ULLINK 2015 76 public void execute(){ [...] BarOutput output; try { output = process(intput); } catch (Exception ex) { scheduler.trash(ticket); throw new RuntimeException(ex); } scheduler.run(ticket, { () => write(output); }); }
  • 77. Ticketing © ULLINK 2015 77 • Open sourced on GitHub • Opened to PR & discussion on the design • Used internally
  • 78. © ULLINK 2015 78 Takeaways
  • 79. Takeaways © ULLINK 2015 79 • Measure • Distribute • Atomically update • Order • RingBuffer: easy lock-free • Ordered without consumer thread
  • 80. References © ULLINK 2015 80 • jucProfiler: http://www.infoq.com/articles/jucprofiler • Java Memory Model Pragmatics: http://shipilev.net/blog/2014/jmm- pragmatics/ • Memory Barriers and JVM Concurrency: http://www.infoq. com/articles/memory_barriers_jvm_concurrency • JSR 133 (FAQ): http://www.cs.umd. edu/~pugh/java/memoryModel/jsr-133-faq.html • CPU cache flushing fallacy: http://mechanical-sympathy.blogspot. fr/2013/02/cpu-cache-flushing-fallacy.html • atomic<> Weapons: http://channel9.msdn. com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter- atomic-Weapons-1-of-2
  • 81. References © ULLINK 2015 81 • circular buffers: http://lwn.net/Articles/378262/ • Mr T queues: https://github. com/mjpt777/examples/tree/master/src/java/uk/co/real_l ogic/queues • Lock-free algorithms by Mr T: http://www.infoq. com/presentations/Lock-free-Algorithms • Futex are tricky U. Drepper: http://www.akkadia. org/drepper/futex.pdf • JCTools: https://github.com/JCTools/JCTools • Nitsan Wakart blog: http://psy-lob-saw.blogspot.com
  • 82. References © ULLINK 2015 82 • Exploiting data parallelism in ordered data streams: https://software.intel.com/en-us/articles/exploiting-data- parallelism-in-ordered-data-streams • OrderedScheduler: https://github.com/Ullink/ordered- scheduler • Concurrency Freaks: http://concurrencyfreaks.blogspot. com • Preshing on programming: http://preshing.com/ • Is Parallel Programming Hard, And If So, What Can You Do About It? https://kernel. org/pub/linux/kernel/people/paulmck/perfbook/perfbook. 2015.01.31a.pdf
  • 83. ABOUT ULLINK ULLINK’s electronic financial trading software solutions and services address the challenges of new regulations and increased fragmentation. Emerging markets require fast, flexible and compliant solutions for buy-side and sell-side market participants; providing a competitive advantage for both low touch and high touch trading. www.ullink.com FIND OUT MORE Contact our Sales Team to learn more about the services listed herein as well as our full array of offerings: +81 3 3664 4160 (Japan) +852 2521 5400 (Asia Pacific) +1 212 991 0816 (North America) +55 11 3171 2409 (Latin America) +44 20 7488 1655 (UK and EMEA) connect@ullink.com © ULLINK 2015 83