SlideShare uma empresa Scribd logo
1 de 34
Low level Java programming 
With examples from OpenHFT 
Peter Lawrey 
CEO and Principal Consultant 
Higher Frequency Trading. 
Presentation to Joker 2014 St Petersburg, October 2014.
About Us… 
Higher Frequency Trading is a small 
consulting and software development house 
specialising in: 
• Low latency, high throughput software 
• 6 developers + 3 staff in Europe and USA. 
• Sponsor HFT related open source projects 
• Core Java engineering
About Me… 
• CEO and Principal Consultant 
• Third on Stackoverflow for Java, 
most Java Performance answers. 
• Vanilla Java blog with 3 million views 
• Founder of the Performance Java User's 
Group 
• An Australian, based in the U.K.
What Is The Problem We Solve? 
"I want to be able to read and write my 
data to a persisted, distributed system, 
with the speed of in memory data 
structures"
Agenda 
What are our Open Source products used for? 
Where to start with low latency? 
When you know you have a problem, what can you do?
Chronicle scaling Vertically and Horizontally 
 Shares data structure between processes 
 Replication between machines 
 Build on a low level library Java Lang. 
 Millions of operations per second. 
 Micro-second latency. No TCP locally. 
 Synchronous logging to the OS. 
 Apache 2.0 available on GitHub 
 Persisted via the OS.
What is Chronicle Map/Set? 
 Low latency persisted key-value store. 
 Latency between processes around 200 ns. 
 In specialized cases, latencies < 25 ns. 
 Throughputs up to 30 million/second.
What is Chronicle Map/Set? 
 ConcurrentMap or Set interface 
 Designed for reactive programming 
 Replication via TCP and UDP. 
 Apache 2.0 open source library. 
 Pure Java, supported on Windows, Linux, Mac OSX.
What is Chronicle Queue? 
 Low latency journaling and logging. 
 Low latency cross JVM communication. 
 Designed for reactive programming 
 Throughputs up to 40 million/second.
What is Chronicle Queue? 
 Latencies between processes of 200 nano-seconds. 
 Sustain rates of 400 MB/s, peaks much higher. 
 Replication via TCP. 
 Apache 2.0 open source library. 
 Pure Java, supported on Windows, Linux, Mac OSX.
Chronicle monitoring a legacy application
Chronicle journalling multiple applications
Chronicle for low latency trading
Short demo using OSResizesMain 
Note: The “VIRT” virtual memory size is 125t for 125 TB, actual usage 97M 
System memory: 7.7 GB, Extents of map: 137439.0 GB, disk used: 97MB, 
addressRange: 233a777f000-7f33a8000000 
$ ls -lh /tmp/oversized* 
-rw-rw-r-- 1 peter peter 126T Oct 20 17:03 /tmp/over-sized... 
$ du -h /tmp/oversized* 
97M /tmp/over-sized....
Where to start with low latency?
You need to measure first. 
 What is the end to end use case you need to improve? 
 Is it throughput or latency you need to improve? 
 Throughput or average latency hides poor latencies. 
 Avoid co-ordinated omission. See Gil Tene's talks.
Looking at the latency percentiles
Tools to help you measure 
 A commercial profiler. e.g. YourKit. 
 Instrument timings in production. 
 Record and replay production loads. 
 Avoid co-ordinated omission. See Gil Tene's talks. 
 If you can't change the code, Censum can help you 
tune your GC pause times. 
 Azul's Zing “solves” GC pause times, but has many 
other tools to reduce jitter.
What to look for when profiling 
 Reducing the allocation rate is often a quick win. 
 Memory profile to reduce garbage produced. 
 When CPU profiling, leave the memory profiler on. 
 If the profiler is no long helpful, application 
instrumentation can take it to the next level.
When you know you have a problem, what 
can you do about it?
Is garbage unavoidable? 
 You can always reduce it further and further, but at 
some point it's not worth it. 
 For a web service, 500 MB/s might be ok. 
 For a trading system, 500 KB/s might be ok. 
 If you produce 250 KB/s it will take you 24 hours 
to fill a 24 GB Eden space.
Common things to tune. 
A common source of garbage is Iterators. 
for (String s : arrayList) { } 
Creates an Iterator, however 
for (int i = 0, len = arrayList.size(); i < len; i++) { 
String s = arrayList.get(i); 
} 
Doesn't create an Iterator.
Common things to tune. 
BigDecimal can be a massive source of garbage. 
BigDecimal a = b.add(c) 
.divide(BigDecimal.TWO, 2, ROUND_HALF_UP); 
The same as double produces no garbage. 
double a = round(b + c, 2); 
You have to have a library to support rounding. Without 
it you will get rounding and representation errors.
Be aware of your memory speeds. 
concurrency Clock cycles Seconds 
L1 caches multi-core 4 1 seconds 
L2 cache multi-core 10 3 seconds 
L3 cache socket wide 40-75 15 seconds 
Main memory System wide 200 50 seconds. 
SSD access System wide 50,000 14 hours 
Local Network Network 180,000 2 days 
HDD System wide 30,000,000 1 year. 
To maximise performance you want to spend as much 
time in L3, or ideally L1/L2 caches as possible.
Memory access is faster with less garbage 
Reducing garbage minimises filling your caches with 
garbage. 
If you are producing 300 MB/s of garbage your L1 cache 
will be filled with garbage is about 100 micro-seconds, 
your L2 cache will be filled in under 1 milli-second. 
The L3 cache and main memory shared and the more you 
use this, the less scalability you will get from your multi-cores.
Faster memory access 
 Reduce garbage 
 Reduce pointer chasing 
 Use primitives instead of objects. 
 Avoid false sharing for highly contended mutated values
Lock free coding 
 AtomicLong.addAndGet(n) 
 Unsafe.compareAndSwapLong 
 Unsafe.getAndAddLong
Using off heap memory 
 Reduce GC work load. 
 More control over layout 
 Data can be persisted 
 Data can be embedded into multiple processes. 
 Can exploit virtual memory allocation instead of 
main memory allocation.
Low latency network connections 
 Kernel by pass network cards e.g. Solarflare 
 Single hop latencies around 5 micro-seconds 
 Make scaling to more machines practical when tens of 
micro-seconds matter.
Reducing micro-jitter 
 Unless you isolate a CPU, it will get interrupted by the 
scheduler a lot. 
 You can see delays of 1 – 5 ms every hour on an 
otherwise idle machine. 
 On a virtualised machine, you can see delays of 50 ms. 
 Java Thread Affinity library can declaratively layout 
your critical threads.
Reducing micro-jitter 
Number of interrupts per hour by length.
Q & A 
http://openhft.net/ 
Performance Java User's Group. 
@PeterLawrey 
peter.lawrey@higherfrequencytrading.com

Mais conteúdo relacionado

Mais procurados

HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 

Mais procurados (20)

What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Spark Performance Tuning .pdf
Spark Performance Tuning .pdfSpark Performance Tuning .pdf
Spark Performance Tuning .pdf
 
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
 
MySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialMySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn Tutorial
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKI
 
Containerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container RuntimeContainerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container Runtime
 
Introduce to Rust-A Powerful System Language
Introduce to Rust-A Powerful System LanguageIntroduce to Rust-A Powerful System Language
Introduce to Rust-A Powerful System Language
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
1 Million Writes per second on 60 nodes with Cassandra and EBS
1 Million Writes per second on 60 nodes with Cassandra and EBS1 Million Writes per second on 60 nodes with Cassandra and EBS
1 Million Writes per second on 60 nodes with Cassandra and EBS
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Kernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringKernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uring
 
[Outdated] Secrets of Performance Tuning Java on Kubernetes
[Outdated] Secrets of Performance Tuning Java on Kubernetes[Outdated] Secrets of Performance Tuning Java on Kubernetes
[Outdated] Secrets of Performance Tuning Java on Kubernetes
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 

Destaque

Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipc
Peter Lawrey
 
ISI 5121 Trove Presentation
ISI 5121 Trove PresentationISI 5121 Trove Presentation
ISI 5121 Trove Presentation
ksirett
 
Computer programming language concept
Computer programming language conceptComputer programming language concept
Computer programming language concept
Afiq Sajuri
 

Destaque (20)

Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
 
Streams and lambdas the good, the bad and the ugly
Streams and lambdas the good, the bad and the uglyStreams and lambdas the good, the bad and the ugly
Streams and lambdas the good, the bad and the ugly
 
Low latency for high throughput
Low latency for high throughputLow latency for high throughput
Low latency for high throughput
 
Deterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systemsDeterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systems
 
Responding rapidly when you have 100+ GB data sets in Java
Responding rapidly when you have 100+ GB data sets in JavaResponding rapidly when you have 100+ GB data sets in Java
Responding rapidly when you have 100+ GB data sets in Java
 
Legacy lambda code
Legacy lambda codeLegacy lambda code
Legacy lambda code
 
Determinism in finance
Determinism in financeDeterminism in finance
Determinism in finance
 
High Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databaseHigh Frequency Trading and NoSQL database
High Frequency Trading and NoSQL database
 
Microservices for performance - GOTO Chicago 2016
Microservices for performance - GOTO Chicago 2016Microservices for performance - GOTO Chicago 2016
Microservices for performance - GOTO Chicago 2016
 
Introduction to OpenHFT for Melbourne Java Users Group
Introduction to OpenHFT for Melbourne Java Users GroupIntroduction to OpenHFT for Melbourne Java Users Group
Introduction to OpenHFT for Melbourne Java Users Group
 
Reactive Programming in Java 8 with Rx-Java
Reactive Programming in Java 8 with Rx-JavaReactive Programming in Java 8 with Rx-Java
Reactive Programming in Java 8 with Rx-Java
 
Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipc
 
ISI 5121 Trove Presentation
ISI 5121 Trove PresentationISI 5121 Trove Presentation
ISI 5121 Trove Presentation
 
Open HFT libraries in @Java
Open HFT libraries in @JavaOpen HFT libraries in @Java
Open HFT libraries in @Java
 
Java and the blockchain - introducing web3j
Java and the blockchain - introducing web3jJava and the blockchain - introducing web3j
Java and the blockchain - introducing web3j
 
RxJava - introduction & design
RxJava - introduction & designRxJava - introduction & design
RxJava - introduction & design
 
John Davies: "High Performance Java Binary" from JavaZone 2015
John Davies: "High Performance Java Binary" from JavaZone 2015John Davies: "High Performance Java Binary" from JavaZone 2015
John Davies: "High Performance Java Binary" from JavaZone 2015
 
Thread Safe Interprocess Shared Memory in Java (in 7 mins)
Thread Safe Interprocess Shared Memory in Java (in 7 mins)Thread Safe Interprocess Shared Memory in Java (in 7 mins)
Thread Safe Interprocess Shared Memory in Java (in 7 mins)
 
Computer programming language concept
Computer programming language conceptComputer programming language concept
Computer programming language concept
 
Tuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesTuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issues
 

Semelhante a Low level java programming

Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginners
webhostingguy
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Cal Henderson
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
Stephen Rose
 

Semelhante a Low level java programming (20)

Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginners
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operation
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Cache memory presentation
Cache memory presentationCache memory presentation
Cache memory presentation
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
Lamp Stack Optimization
Lamp Stack OptimizationLamp Stack Optimization
Lamp Stack Optimization
 
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl YeksigianC* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Blades for HPTC
Blades for HPTCBlades for HPTC
Blades for HPTC
 
Cache memory and cache
Cache memory and cacheCache memory and cache
Cache memory and cache
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Low level java programming

  • 1. Low level Java programming With examples from OpenHFT Peter Lawrey CEO and Principal Consultant Higher Frequency Trading. Presentation to Joker 2014 St Petersburg, October 2014.
  • 2. About Us… Higher Frequency Trading is a small consulting and software development house specialising in: • Low latency, high throughput software • 6 developers + 3 staff in Europe and USA. • Sponsor HFT related open source projects • Core Java engineering
  • 3. About Me… • CEO and Principal Consultant • Third on Stackoverflow for Java, most Java Performance answers. • Vanilla Java blog with 3 million views • Founder of the Performance Java User's Group • An Australian, based in the U.K.
  • 4. What Is The Problem We Solve? "I want to be able to read and write my data to a persisted, distributed system, with the speed of in memory data structures"
  • 5. Agenda What are our Open Source products used for? Where to start with low latency? When you know you have a problem, what can you do?
  • 6. Chronicle scaling Vertically and Horizontally  Shares data structure between processes  Replication between machines  Build on a low level library Java Lang.  Millions of operations per second.  Micro-second latency. No TCP locally.  Synchronous logging to the OS.  Apache 2.0 available on GitHub  Persisted via the OS.
  • 7. What is Chronicle Map/Set?  Low latency persisted key-value store.  Latency between processes around 200 ns.  In specialized cases, latencies < 25 ns.  Throughputs up to 30 million/second.
  • 8. What is Chronicle Map/Set?  ConcurrentMap or Set interface  Designed for reactive programming  Replication via TCP and UDP.  Apache 2.0 open source library.  Pure Java, supported on Windows, Linux, Mac OSX.
  • 9.
  • 10.
  • 11. What is Chronicle Queue?  Low latency journaling and logging.  Low latency cross JVM communication.  Designed for reactive programming  Throughputs up to 40 million/second.
  • 12. What is Chronicle Queue?  Latencies between processes of 200 nano-seconds.  Sustain rates of 400 MB/s, peaks much higher.  Replication via TCP.  Apache 2.0 open source library.  Pure Java, supported on Windows, Linux, Mac OSX.
  • 13. Chronicle monitoring a legacy application
  • 15. Chronicle for low latency trading
  • 16. Short demo using OSResizesMain Note: The “VIRT” virtual memory size is 125t for 125 TB, actual usage 97M System memory: 7.7 GB, Extents of map: 137439.0 GB, disk used: 97MB, addressRange: 233a777f000-7f33a8000000 $ ls -lh /tmp/oversized* -rw-rw-r-- 1 peter peter 126T Oct 20 17:03 /tmp/over-sized... $ du -h /tmp/oversized* 97M /tmp/over-sized....
  • 17. Where to start with low latency?
  • 18. You need to measure first.  What is the end to end use case you need to improve?  Is it throughput or latency you need to improve?  Throughput or average latency hides poor latencies.  Avoid co-ordinated omission. See Gil Tene's talks.
  • 19. Looking at the latency percentiles
  • 20. Tools to help you measure  A commercial profiler. e.g. YourKit.  Instrument timings in production.  Record and replay production loads.  Avoid co-ordinated omission. See Gil Tene's talks.  If you can't change the code, Censum can help you tune your GC pause times.  Azul's Zing “solves” GC pause times, but has many other tools to reduce jitter.
  • 21. What to look for when profiling  Reducing the allocation rate is often a quick win.  Memory profile to reduce garbage produced.  When CPU profiling, leave the memory profiler on.  If the profiler is no long helpful, application instrumentation can take it to the next level.
  • 22. When you know you have a problem, what can you do about it?
  • 23. Is garbage unavoidable?  You can always reduce it further and further, but at some point it's not worth it.  For a web service, 500 MB/s might be ok.  For a trading system, 500 KB/s might be ok.  If you produce 250 KB/s it will take you 24 hours to fill a 24 GB Eden space.
  • 24. Common things to tune. A common source of garbage is Iterators. for (String s : arrayList) { } Creates an Iterator, however for (int i = 0, len = arrayList.size(); i < len; i++) { String s = arrayList.get(i); } Doesn't create an Iterator.
  • 25. Common things to tune. BigDecimal can be a massive source of garbage. BigDecimal a = b.add(c) .divide(BigDecimal.TWO, 2, ROUND_HALF_UP); The same as double produces no garbage. double a = round(b + c, 2); You have to have a library to support rounding. Without it you will get rounding and representation errors.
  • 26. Be aware of your memory speeds. concurrency Clock cycles Seconds L1 caches multi-core 4 1 seconds L2 cache multi-core 10 3 seconds L3 cache socket wide 40-75 15 seconds Main memory System wide 200 50 seconds. SSD access System wide 50,000 14 hours Local Network Network 180,000 2 days HDD System wide 30,000,000 1 year. To maximise performance you want to spend as much time in L3, or ideally L1/L2 caches as possible.
  • 27. Memory access is faster with less garbage Reducing garbage minimises filling your caches with garbage. If you are producing 300 MB/s of garbage your L1 cache will be filled with garbage is about 100 micro-seconds, your L2 cache will be filled in under 1 milli-second. The L3 cache and main memory shared and the more you use this, the less scalability you will get from your multi-cores.
  • 28. Faster memory access  Reduce garbage  Reduce pointer chasing  Use primitives instead of objects.  Avoid false sharing for highly contended mutated values
  • 29. Lock free coding  AtomicLong.addAndGet(n)  Unsafe.compareAndSwapLong  Unsafe.getAndAddLong
  • 30. Using off heap memory  Reduce GC work load.  More control over layout  Data can be persisted  Data can be embedded into multiple processes.  Can exploit virtual memory allocation instead of main memory allocation.
  • 31. Low latency network connections  Kernel by pass network cards e.g. Solarflare  Single hop latencies around 5 micro-seconds  Make scaling to more machines practical when tens of micro-seconds matter.
  • 32. Reducing micro-jitter  Unless you isolate a CPU, it will get interrupted by the scheduler a lot.  You can see delays of 1 – 5 ms every hour on an otherwise idle machine.  On a virtualised machine, you can see delays of 50 ms.  Java Thread Affinity library can declaratively layout your critical threads.
  • 33. Reducing micro-jitter Number of interrupts per hour by length.
  • 34. Q & A http://openhft.net/ Performance Java User's Group. @PeterLawrey peter.lawrey@higherfrequencytrading.com