SlideShare a Scribd company logo
1 of 46
Millions Quotes Per Second.
       A story of pure Java
       market data vendor

© 2013, Roman Elizarov, Devexperts
Market Data Rates
                      10000 000



                       9000 000



                       8000 000



                       7000 000
messages per second




                       6000 000



                       5000 000



                       4000 000



                       3000 000



                       2000 000



                       1000 000



                             0
                                  Основной   Основной      Основной       Основной   Основной   Основной   Основной   Основной



                                               US Equities, Indexes and Futures                   OPRA
Market Data Vendor

• Process data coming from exchange data feeds
   - Parse
   - Normalize
• Distribute data to customers
   - Gather into a single feed
   - Store and retrieve (for onDemand historical requests)
   - Serialize and transfer
   - Scatter to multiple consumers based on actual subscription
dxFeed High Level Picture

                                   CME, CBOT, NYMEX, COMEX,
                                ICE Futures U.S., CBOE, TSX, TSXV,
                                                MX




         Chicago ticker plant

                                                  10Gbit
                                    resilient redundant connectivity
                                              infrastructure
                                                                          NYSE, AMEX,
                                                                            NASDAQ,
                                                                           ISE, OPRA,
                                                                       FINRA, PinkSheets




                                               New York ticker plant


                        Direct cross-connect
                                                                                       Customer connection point
                                 SFTI
                                 TNS
                               SAVVIS
                             BT Radianz
                               Internet
A Bit of History

• Devexperts was founded in 2002
   - as an Upscale Financial IT company
• QDS project was born in 2003
   - to address market data distribution problem
   - in a high performance-way (initial design goal was 1M mps)
• dxFeed service was launched in 2008
   - to provide our customers with live market data directly from
     exchanges, using QDS for distribution
• dxFeed API was created on top of QDS in 2009
   - to provide an easier customer-facing API and enable 3rd party
     developers to integrate their code with dxFeed
Threads                                       Portability
                                Community                        Developers
  Garbage Collection

                                                     Libraries and frameworks
      Backwards-compatibility

Refactoring                                                     Type Safety


      Open source
                                                              Memory model
   Reflection
                    Productivity             Tools
                                                                Readability
  HotSpot JIT

                                                            Byte-code manipulation
Simplicity       The most popular language
* Applies to any language
Java object layout
          String[]            • String[] that is filled with
                                some strings in Java
          header


           size      String
            [0]
                     header
            [1]                      char[]
            [2]      value
                                    header
            [3]      hash

            ...      String          size

                                      „T‟
                     header
                                      „E‟

                     value            „S‟

                     hash             „T‟


                       ...             ...
Memory layout solution

• Prefer array-based data-structures to linked ones
   - Most Java programs get immediate performance boost by replacing all
     mentions of LinkedList by ArrayList
• Use Java arrays or ByteBuffer classes where it matters
   - They are guaranteed to be contiguous in memory
   - Layout your data into array manually
• That‟s how QDS core is designed
   - All it critical data structures are rolled onto int[] and Object[]
byte[] vs ByteBuffer

• byte[] is always heap-based
   - Faster for byte-oriented access
• ByteBuffer can be both “heap” and “direct”
   - Be especially careful with direct ByteBuffers
   - If you don‟t Pool them, you may run out of native memory before Java
     GC has a chance to run
   - Can be faster for short-, int- or long- oriented access via get/putXXX
     methods
      • But make sure you use native byte order (BIG_ENDIAN is default)
   - Direct ByteBuffers don‟t need an extra buffer copy when doing
     input/output with NIO
Measure, measure, measure
The cost of later change is too high
Garbage collection

• Makes your code much easier
   - to design
   - to debug
   - to maintain
• GC performs really well when
   - Objects are very short-lived
      • They are not promoted to old gen
      • They are reclaimed by high-throughput scavenge GC
   - Object are very long-lived and are not modified or contain primitives
      • Scavenge GC does not waste time scanning them
Object allocation

• Allocation of small objects is fast
   - new String() is ~20 bytes on 64bit VM with compressed oops
      • not counting char[] object inside of it
   - ~4.5ns per allocation (on 2.6GHz i5)
• But becomes slower when you include amortized GC cost
• And can become much slower if you
   - have big static memory footprint
   - have “medium-lived” objects
   - have lots of threads (and thus a lot of GC roots and coordination)
   - use references (java.lang.ref) a lot
   - mutate your memory a lot, especially references (GC card marking)
Manual memory management

• When you would consider manual memory management in native
  code (custom object pools), consider doing the same in Java
• General advise
   - Pool large objects
      • They are expensive to be allocated and to be collected by GC
   - Avoid small objects
      • Especially “medium-lived” ones
      • Layout them into arrays if you need store them
Object allocation action plan (1)

• Watch the percentage of time your system spends doing GC
   - -verbose:gc
     -XX:+PrintGCDetails
     -XX:+PrintGCTimeStamps
   - “jconsole” and “jvisualvm” tools show this information
   - It is available programmatically via GarbageCollectorMXBean
       • At Devexperts we collect it and report (push) in real-time via
          MARS (Monitoring and Reporting System) using a dedicated
          JVMSelfMonitoring plugin
       • Our support team have alerts configured on high GC % in our
          systems
• Act when it becomes too big
Object allocation action plan (2)

• Tune GC to reduce overhead without code changes
• Identify places when most of allocations take places and optimize
  them
   - Use off-the-shelf Java profilers
   - Use Devexperts aprof for a full allocation picture at production speed
     http://code.devexperts.com/display/AProf/
Object reuse and sharing

• Pooling small objects in often a bad idea
   - Unless you are trying to quickly speed up code that heavily relies on
     lots of small objects
   - It‟s better to get rid of small objects altogether
        • See boxing in performance critical code  get rid of it
• But reusing / sharing small objects is great
   - Strings are typical candidate for data-processing code
• Common pitfalls (don‟t do it, unless you fully understand it)
   - String.intern
   - WeakReference
Actually, by their char arrays
String I/O

• String are often duplicated in memory
• Reading any string-denoted data from database, from file, from
  network – all produces new strings
• Where performance matters, reuse strings
   - For example see StringCache class from
     http://docs.dxfeed.com/dxlib/api/com/devexperts/util/StringCache.html
   - The key method is get(char[])
      • You can reuse char[] where data is read
      • And get an instance of String from cache if it is there
Radical object / reference elimination

• Unroll complex objects into arrays
   - For example, a collection of strings can be represented in a single
     byte[]
• Renumber shared object instances
   - Represent string reference as int
   - That‟s what QDS core does for efficient String manipulation
      • Faster to compare
      • Faster to hash
      • Avoids slower “modify reference” operations (marks GC cards)
   - But requires hand-crafted memory management
      • QDS does reference counting, but custom GC is also feasible
Hardcore optimization

• Use sun.misc.Unsafe when everything else fails
   - It gives you full native speed
   - But no range checks nor type-safety
      • You are on your own!
   - Good fit for integration with native data structures when needed
• QDS core uses it in few places
   - Mainly to provide wait-free execution guarantees with an appropriate
     synchronization for array-based data structures
   - But there is a fallback code for cases when sun.misc.Unsafe is not
     available
Even more hardcore – hand-written SMT

• If you have to use linked data structures
   - Consider traversing multiple linked lists simultaneously in the same
     thread
   - Akin to hardware SMT, but in software
   - The code becomes much more complicated
   - But the performance can considerably increase




                     * Not a Java-specific optimization, but fun to mention here
Threads and scalability

• Share data across the threads to further reduce memory footprint
   - But carefully design and implement this sharing
• Learn and love Java Memory Model
   - It makes your correctly-synchronized multi-threaded code fully
     portable across CPU architectures
• QDS core is a thread-safe data structure with a mix of lock-
  free, fine-grained and coarse-grained locking approaches which
  makes it vertically scalable
Be careful with threads and locks

• Thread switches introduce a
  considerable latency (~20us)   1. Enter Lock



• Lock contention forces even                    2. Context Switch

  more thread switches                                               3. Try to lock

• It is not a Java-specific                      4. Context Switch

                                  5. Exit Lock
  concern, but a common Java-                                        6. Context switch

  specific problem, since Java                                        and enter lock


  makes threads easier for
  programmers to use (and many
  do use them)
Data flow for horizontal scalability

                                   Subscribes:
                             IBM, GE. QQQQ, MSFT,
                                   INTC, SPX

                                                 IBM, GE ticks

                             Multiplexor



                                           QDTicker




                                                          GE ticks
                         IBM, GE ticks

                               Subscibes:              Subscibes:
                         IBM, GE, QQQQ, MSFT          GE, INTC, SPX




                     QDTicker                                     QDTicker


                   IBM
                                 GE                         GE                SPX
                  MSFT
                                IBM                        INTC              INTC
                  QQQQ
HotSpot Server VM

• Run “java -server” (it is a default on server-class machines)
• Does
   - Very deep code inlining
   - Loop unrolling
   - Optimize virtual and interface calls based on collected profile
   - Escape analysis for synchronization and allocation elimination
• Embrace it!
   - Don‟t fear writing your code in a nice object-oriented way
      • In most of cases, that is
      • Do still avoid too much “object orientation” in the most
        performance-sensitive places
HotSpot challenges

• It is harder to profile, stress-test, and tune code
   - You need to “warm up” the code to get meaningful result
   - Small changes in code can lead to big differences that are hard to
     explain
   - Compilation of less busy code can trigger at any time and cause
     unexpected latency spikes
• Don‟t do micro-tests
   - Test the whole system together instead
• Do micro-tests
   - To learn which code patters are better across the board
   - Small savings add up
Looking at generated assembly code

• -XX:+UnlockDiagnosticVMOptions
  -XX:CompileCommand=print,*<class-name>.<method-name>
  -XX:PrintAssemblyOptions=intel
• You will need “hsdis” library added to your JRE/JDK with the actual
  disassembler code
   - But you have to build it yourself:
     http://hg.openjdk.java.net/jdk7/hotspot/hotspot/file/tip/src/share/tools/hsdis/README
Use native profilers

• Java profiles are great tools, but they don‟t use processor
  performance counters and lack the ability to recognize such
  problems like memory pressure
   - And they don‟t always produce a clear picture
   - All “cpu time” is reported at the nearest “safe point”, not at the actual
     code line that consumed CPU
• Use native profilers to figure it out
   - Sun Studio Performance Analyzer
   - Intel VTune Amplifier
   - AMD CodeAnalyst
General (1)

• Classic data structures and algorithms
   - Use CPU and memory efficient data structures and algorithms
   - Know and love hash tables
      • They are the most useful data structure in a typical business
        application
• Lock-free data structures will help you to scale vertically
• Every byte counts. Remember about bytes.
   - QDS core compactly represents data as 4-byte integers while working
     with them in memory
   - QDS uses compact byte-level compression on the wire
   - Even more compact bit-level compression is used in long-term store
General (2)

• Burst handling
   - Process data in batches to amortize batch overhead across messages
   - QDS increases batch size under load to decrease overhead
• Architecture
   - Use layers
   - Lower layers of architectures should generally be used in more places
     and be more optimized
   - The outer layer, dxFeed API, is the easies one to use and understand
     and most object-oriented, but less optimized
Architecture layers


        JS API

          dxFeed API           Tools      Gateways

                       QDS Core

                  Transport Protocol

                                   ZLIB        SSL

        Sockets          NIO              Files, etc
QDS API (1)
print quote bid/ask on the screen
QDS API (2)
QDS API Summary

• Pros
   - High-performance design
   - Flexible (can be used in various ways)
      • QDS Multiplexor is an application on top of QDS API
      • As well as all other command-line QDS tools
   - Extensible with clear separation of interfaces and implementation
• Cons
   - Verbose, lots of code to do simple things
   - Error-prone (easy to get wrong and to introduce subtle bugs)
• Everybody needs Quote, Trade, etc with easy-to-use API
   - Hence, dxFeed API was born
dxFeed API
print quote bid/ask on the screen
Contact me by email: elizarov at devexperts.com

More Related Content

What's hot

Jvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraJvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraQuentin Ambard
 
Assessing Graph Solutions for Apache Spark
Assessing Graph Solutions for Apache SparkAssessing Graph Solutions for Apache Spark
Assessing Graph Solutions for Apache SparkDatabricks
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Roopa Tangirala
 
Modern Service Professional Week #5 - Rum
Modern Service Professional Week #5 - RumModern Service Professional Week #5 - Rum
Modern Service Professional Week #5 - RumPhilippe C. Grandbois
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkKazuaki Ishizaki
 
7 Prioritization Techniques for Product Managers
7 Prioritization Techniques for Product Managers7 Prioritization Techniques for Product Managers
7 Prioritization Techniques for Product ManagersProductPlan
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueDatabricks
 
Escaping the Build Trap
Escaping the Build Trap Escaping the Build Trap
Escaping the Build Trap Melissa Perri
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitSpark Summit
 
Hardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiaHardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiazznate
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Databricks
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
 
Understanding the Scrum Team and Scrum Roles
Understanding the Scrum Team and Scrum RolesUnderstanding the Scrum Team and Scrum Roles
Understanding the Scrum Team and Scrum RolesOrangescrum
 
Scrum master basics
Scrum master basics Scrum master basics
Scrum master basics Elad Sofer
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solrlucenerevolution
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Databricks
 
How Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at ScaleHow Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at ScaleDatabricks
 

What's hot (20)

Jvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraJvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & Cassandra
 
Assessing Graph Solutions for Apache Spark
Assessing Graph Solutions for Apache SparkAssessing Graph Solutions for Apache Spark
Assessing Graph Solutions for Apache Spark
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Modern Service Professional Week #5 - Rum
Modern Service Professional Week #5 - RumModern Service Professional Week #5 - Rum
Modern Service Professional Week #5 - Rum
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
7 Prioritization Techniques for Product Managers
7 Prioritization Techniques for Product Managers7 Prioritization Techniques for Product Managers
7 Prioritization Techniques for Product Managers
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
 
Escaping the Build Trap
Escaping the Build Trap Escaping the Build Trap
Escaping the Build Trap
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Hardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiaHardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoia
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
 
UX and Scrum
UX and ScrumUX and Scrum
UX and Scrum
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Understanding the Scrum Team and Scrum Roles
Understanding the Scrum Team and Scrum RolesUnderstanding the Scrum Team and Scrum Roles
Understanding the Scrum Team and Scrum Roles
 
Scrum master basics
Scrum master basics Scrum master basics
Scrum master basics
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solr
 
User Story Mapping
User Story MappingUser Story Mapping
User Story Mapping
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
 
How Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at ScaleHow Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at Scale
 

Similar to Millions quotes per second in pure java

Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBen Stopford
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBigDataCloud
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...lisapaglia
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use CasesDATAVERSITY
 
Common MongoDB Use Cases Webinar
Common MongoDB Use Cases WebinarCommon MongoDB Use Cases Webinar
Common MongoDB Use Cases WebinarMongoDB
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesBernd Ocklin
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 
Redis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetupRedis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetupItamar Haber
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Michael Hiskey
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problemsAbhishek Gupta
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use CasesDATAVERSITY
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsOleg Magazov
 

Similar to Millions quotes per second in pure java (20)

Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
 
Common MongoDB Use Cases Webinar
Common MongoDB Use Cases WebinarCommon MongoDB Use Cases Webinar
Common MongoDB Use Cases Webinar
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Redis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetupRedis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetup
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 

More from Roman Elizarov

Kotlin Coroutines in Practice @ KotlinConf 2018
Kotlin Coroutines in Practice @ KotlinConf 2018Kotlin Coroutines in Practice @ KotlinConf 2018
Kotlin Coroutines in Practice @ KotlinConf 2018Roman Elizarov
 
Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017Roman Elizarov
 
Introduction to Coroutines @ KotlinConf 2017
Introduction to Coroutines @ KotlinConf 2017Introduction to Coroutines @ KotlinConf 2017
Introduction to Coroutines @ KotlinConf 2017Roman Elizarov
 
Fresh Async with Kotlin @ QConSF 2017
Fresh Async with Kotlin @ QConSF 2017Fresh Async with Kotlin @ QConSF 2017
Fresh Async with Kotlin @ QConSF 2017Roman Elizarov
 
Scale Up with Lock-Free Algorithms @ JavaOne
Scale Up with Lock-Free Algorithms @ JavaOneScale Up with Lock-Free Algorithms @ JavaOne
Scale Up with Lock-Free Algorithms @ JavaOneRoman Elizarov
 
Kotlin Coroutines Reloaded
Kotlin Coroutines ReloadedKotlin Coroutines Reloaded
Kotlin Coroutines ReloadedRoman Elizarov
 
Lock-free algorithms for Kotlin Coroutines
Lock-free algorithms for Kotlin CoroutinesLock-free algorithms for Kotlin Coroutines
Lock-free algorithms for Kotlin CoroutinesRoman Elizarov
 
Introduction to Kotlin coroutines
Introduction to Kotlin coroutinesIntroduction to Kotlin coroutines
Introduction to Kotlin coroutinesRoman Elizarov
 
Non blocking programming and waiting
Non blocking programming and waitingNon blocking programming and waiting
Non blocking programming and waitingRoman Elizarov
 
ACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems ReviewRoman Elizarov
 
Многопоточное Программирование - Теория и Практика
Многопоточное Программирование - Теория и ПрактикаМногопоточное Программирование - Теория и Практика
Многопоточное Программирование - Теория и ПрактикаRoman Elizarov
 
Wait for your fortune without Blocking!
Wait for your fortune without Blocking!Wait for your fortune without Blocking!
Wait for your fortune without Blocking!Roman Elizarov
 
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems ReviewRoman Elizarov
 
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems ReviewRoman Elizarov
 
Why GC is eating all my CPU?
Why GC is eating all my CPU?Why GC is eating all my CPU?
Why GC is eating all my CPU?Roman Elizarov
 
Многопоточные Алгоритмы (для BitByte 2014)
Многопоточные Алгоритмы (для BitByte 2014)Многопоточные Алгоритмы (для BitByte 2014)
Многопоточные Алгоритмы (для BitByte 2014)Roman Elizarov
 
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)Roman Elizarov
 
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems ReviewRoman Elizarov
 
Java Serialization Facts and Fallacies
Java Serialization Facts and FallaciesJava Serialization Facts and Fallacies
Java Serialization Facts and FallaciesRoman Elizarov
 

More from Roman Elizarov (20)

Kotlin Coroutines in Practice @ KotlinConf 2018
Kotlin Coroutines in Practice @ KotlinConf 2018Kotlin Coroutines in Practice @ KotlinConf 2018
Kotlin Coroutines in Practice @ KotlinConf 2018
 
Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017
 
Introduction to Coroutines @ KotlinConf 2017
Introduction to Coroutines @ KotlinConf 2017Introduction to Coroutines @ KotlinConf 2017
Introduction to Coroutines @ KotlinConf 2017
 
Fresh Async with Kotlin @ QConSF 2017
Fresh Async with Kotlin @ QConSF 2017Fresh Async with Kotlin @ QConSF 2017
Fresh Async with Kotlin @ QConSF 2017
 
Scale Up with Lock-Free Algorithms @ JavaOne
Scale Up with Lock-Free Algorithms @ JavaOneScale Up with Lock-Free Algorithms @ JavaOne
Scale Up with Lock-Free Algorithms @ JavaOne
 
Kotlin Coroutines Reloaded
Kotlin Coroutines ReloadedKotlin Coroutines Reloaded
Kotlin Coroutines Reloaded
 
Lock-free algorithms for Kotlin Coroutines
Lock-free algorithms for Kotlin CoroutinesLock-free algorithms for Kotlin Coroutines
Lock-free algorithms for Kotlin Coroutines
 
Introduction to Kotlin coroutines
Introduction to Kotlin coroutinesIntroduction to Kotlin coroutines
Introduction to Kotlin coroutines
 
Non blocking programming and waiting
Non blocking programming and waitingNon blocking programming and waiting
Non blocking programming and waiting
 
ACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems Review
 
Многопоточное Программирование - Теория и Практика
Многопоточное Программирование - Теория и ПрактикаМногопоточное Программирование - Теория и Практика
Многопоточное Программирование - Теория и Практика
 
Wait for your fortune without Blocking!
Wait for your fortune without Blocking!Wait for your fortune without Blocking!
Wait for your fortune without Blocking!
 
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
 
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
 
Why GC is eating all my CPU?
Why GC is eating all my CPU?Why GC is eating all my CPU?
Why GC is eating all my CPU?
 
Многопоточные Алгоритмы (для BitByte 2014)
Многопоточные Алгоритмы (для BitByte 2014)Многопоточные Алгоритмы (для BitByte 2014)
Многопоточные Алгоритмы (для BitByte 2014)
 
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
 
DIY Java Profiling
DIY Java ProfilingDIY Java Profiling
DIY Java Profiling
 
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
 
Java Serialization Facts and Fallacies
Java Serialization Facts and FallaciesJava Serialization Facts and Fallacies
Java Serialization Facts and Fallacies
 

Millions quotes per second in pure java

  • 1. Millions Quotes Per Second. A story of pure Java market data vendor © 2013, Roman Elizarov, Devexperts
  • 2. Market Data Rates 10000 000 9000 000 8000 000 7000 000 messages per second 6000 000 5000 000 4000 000 3000 000 2000 000 1000 000 0 Основной Основной Основной Основной Основной Основной Основной Основной US Equities, Indexes and Futures OPRA
  • 3. Market Data Vendor • Process data coming from exchange data feeds - Parse - Normalize • Distribute data to customers - Gather into a single feed - Store and retrieve (for onDemand historical requests) - Serialize and transfer - Scatter to multiple consumers based on actual subscription
  • 4. dxFeed High Level Picture CME, CBOT, NYMEX, COMEX, ICE Futures U.S., CBOE, TSX, TSXV, MX Chicago ticker plant 10Gbit resilient redundant connectivity infrastructure NYSE, AMEX, NASDAQ, ISE, OPRA, FINRA, PinkSheets New York ticker plant Direct cross-connect Customer connection point SFTI TNS SAVVIS BT Radianz Internet
  • 5. A Bit of History • Devexperts was founded in 2002 - as an Upscale Financial IT company • QDS project was born in 2003 - to address market data distribution problem - in a high performance-way (initial design goal was 1M mps) • dxFeed service was launched in 2008 - to provide our customers with live market data directly from exchanges, using QDS for distribution • dxFeed API was created on top of QDS in 2009 - to provide an easier customer-facing API and enable 3rd party developers to integrate their code with dxFeed
  • 6. Threads Portability Community Developers Garbage Collection Libraries and frameworks Backwards-compatibility Refactoring Type Safety Open source Memory model Reflection Productivity Tools Readability HotSpot JIT Byte-code manipulation Simplicity The most popular language
  • 7. * Applies to any language
  • 8. Java object layout String[] • String[] that is filled with some strings in Java header size String [0] header [1] char[] [2] value header [3] hash ... String size „T‟ header „E‟ value „S‟ hash „T‟ ... ...
  • 9.
  • 10. Memory layout solution • Prefer array-based data-structures to linked ones - Most Java programs get immediate performance boost by replacing all mentions of LinkedList by ArrayList • Use Java arrays or ByteBuffer classes where it matters - They are guaranteed to be contiguous in memory - Layout your data into array manually • That‟s how QDS core is designed - All it critical data structures are rolled onto int[] and Object[]
  • 11. byte[] vs ByteBuffer • byte[] is always heap-based - Faster for byte-oriented access • ByteBuffer can be both “heap” and “direct” - Be especially careful with direct ByteBuffers - If you don‟t Pool them, you may run out of native memory before Java GC has a chance to run - Can be faster for short-, int- or long- oriented access via get/putXXX methods • But make sure you use native byte order (BIG_ENDIAN is default) - Direct ByteBuffers don‟t need an extra buffer copy when doing input/output with NIO
  • 13. The cost of later change is too high
  • 14. Garbage collection • Makes your code much easier - to design - to debug - to maintain • GC performs really well when - Objects are very short-lived • They are not promoted to old gen • They are reclaimed by high-throughput scavenge GC - Object are very long-lived and are not modified or contain primitives • Scavenge GC does not waste time scanning them
  • 15. Object allocation • Allocation of small objects is fast - new String() is ~20 bytes on 64bit VM with compressed oops • not counting char[] object inside of it - ~4.5ns per allocation (on 2.6GHz i5) • But becomes slower when you include amortized GC cost • And can become much slower if you - have big static memory footprint - have “medium-lived” objects - have lots of threads (and thus a lot of GC roots and coordination) - use references (java.lang.ref) a lot - mutate your memory a lot, especially references (GC card marking)
  • 16. Manual memory management • When you would consider manual memory management in native code (custom object pools), consider doing the same in Java • General advise - Pool large objects • They are expensive to be allocated and to be collected by GC - Avoid small objects • Especially “medium-lived” ones • Layout them into arrays if you need store them
  • 17.
  • 18. Object allocation action plan (1) • Watch the percentage of time your system spends doing GC - -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps - “jconsole” and “jvisualvm” tools show this information - It is available programmatically via GarbageCollectorMXBean • At Devexperts we collect it and report (push) in real-time via MARS (Monitoring and Reporting System) using a dedicated JVMSelfMonitoring plugin • Our support team have alerts configured on high GC % in our systems • Act when it becomes too big
  • 19. Object allocation action plan (2) • Tune GC to reduce overhead without code changes • Identify places when most of allocations take places and optimize them - Use off-the-shelf Java profilers - Use Devexperts aprof for a full allocation picture at production speed http://code.devexperts.com/display/AProf/
  • 20. Object reuse and sharing • Pooling small objects in often a bad idea - Unless you are trying to quickly speed up code that heavily relies on lots of small objects - It‟s better to get rid of small objects altogether • See boxing in performance critical code  get rid of it • But reusing / sharing small objects is great - Strings are typical candidate for data-processing code • Common pitfalls (don‟t do it, unless you fully understand it) - String.intern - WeakReference
  • 21. Actually, by their char arrays
  • 22. String I/O • String are often duplicated in memory • Reading any string-denoted data from database, from file, from network – all produces new strings • Where performance matters, reuse strings - For example see StringCache class from http://docs.dxfeed.com/dxlib/api/com/devexperts/util/StringCache.html - The key method is get(char[]) • You can reuse char[] where data is read • And get an instance of String from cache if it is there
  • 23. Radical object / reference elimination • Unroll complex objects into arrays - For example, a collection of strings can be represented in a single byte[] • Renumber shared object instances - Represent string reference as int - That‟s what QDS core does for efficient String manipulation • Faster to compare • Faster to hash • Avoids slower “modify reference” operations (marks GC cards) - But requires hand-crafted memory management • QDS does reference counting, but custom GC is also feasible
  • 24. Hardcore optimization • Use sun.misc.Unsafe when everything else fails - It gives you full native speed - But no range checks nor type-safety • You are on your own! - Good fit for integration with native data structures when needed • QDS core uses it in few places - Mainly to provide wait-free execution guarantees with an appropriate synchronization for array-based data structures - But there is a fallback code for cases when sun.misc.Unsafe is not available
  • 25. Even more hardcore – hand-written SMT • If you have to use linked data structures - Consider traversing multiple linked lists simultaneously in the same thread - Akin to hardware SMT, but in software - The code becomes much more complicated - But the performance can considerably increase * Not a Java-specific optimization, but fun to mention here
  • 26.
  • 27. Threads and scalability • Share data across the threads to further reduce memory footprint - But carefully design and implement this sharing • Learn and love Java Memory Model - It makes your correctly-synchronized multi-threaded code fully portable across CPU architectures • QDS core is a thread-safe data structure with a mix of lock- free, fine-grained and coarse-grained locking approaches which makes it vertically scalable
  • 28. Be careful with threads and locks • Thread switches introduce a considerable latency (~20us) 1. Enter Lock • Lock contention forces even 2. Context Switch more thread switches 3. Try to lock • It is not a Java-specific 4. Context Switch 5. Exit Lock concern, but a common Java- 6. Context switch specific problem, since Java and enter lock makes threads easier for programmers to use (and many do use them)
  • 29.
  • 30. Data flow for horizontal scalability Subscribes: IBM, GE. QQQQ, MSFT, INTC, SPX IBM, GE ticks Multiplexor QDTicker GE ticks IBM, GE ticks Subscibes: Subscibes: IBM, GE, QQQQ, MSFT GE, INTC, SPX QDTicker QDTicker IBM GE GE SPX MSFT IBM INTC INTC QQQQ
  • 31.
  • 32. HotSpot Server VM • Run “java -server” (it is a default on server-class machines) • Does - Very deep code inlining - Loop unrolling - Optimize virtual and interface calls based on collected profile - Escape analysis for synchronization and allocation elimination • Embrace it! - Don‟t fear writing your code in a nice object-oriented way • In most of cases, that is • Do still avoid too much “object orientation” in the most performance-sensitive places
  • 33. HotSpot challenges • It is harder to profile, stress-test, and tune code - You need to “warm up” the code to get meaningful result - Small changes in code can lead to big differences that are hard to explain - Compilation of less busy code can trigger at any time and cause unexpected latency spikes • Don‟t do micro-tests - Test the whole system together instead • Do micro-tests - To learn which code patters are better across the board - Small savings add up
  • 34.
  • 35. Looking at generated assembly code • -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,*<class-name>.<method-name> -XX:PrintAssemblyOptions=intel • You will need “hsdis” library added to your JRE/JDK with the actual disassembler code - But you have to build it yourself: http://hg.openjdk.java.net/jdk7/hotspot/hotspot/file/tip/src/share/tools/hsdis/README
  • 36. Use native profilers • Java profiles are great tools, but they don‟t use processor performance counters and lack the ability to recognize such problems like memory pressure - And they don‟t always produce a clear picture - All “cpu time” is reported at the nearest “safe point”, not at the actual code line that consumed CPU • Use native profilers to figure it out - Sun Studio Performance Analyzer - Intel VTune Amplifier - AMD CodeAnalyst
  • 37.
  • 38. General (1) • Classic data structures and algorithms - Use CPU and memory efficient data structures and algorithms - Know and love hash tables • They are the most useful data structure in a typical business application • Lock-free data structures will help you to scale vertically • Every byte counts. Remember about bytes. - QDS core compactly represents data as 4-byte integers while working with them in memory - QDS uses compact byte-level compression on the wire - Even more compact bit-level compression is used in long-term store
  • 39. General (2) • Burst handling - Process data in batches to amortize batch overhead across messages - QDS increases batch size under load to decrease overhead • Architecture - Use layers - Lower layers of architectures should generally be used in more places and be more optimized - The outer layer, dxFeed API, is the easies one to use and understand and most object-oriented, but less optimized
  • 40. Architecture layers JS API dxFeed API Tools Gateways QDS Core Transport Protocol ZLIB SSL Sockets NIO Files, etc
  • 41.
  • 42. QDS API (1) print quote bid/ask on the screen
  • 44. QDS API Summary • Pros - High-performance design - Flexible (can be used in various ways) • QDS Multiplexor is an application on top of QDS API • As well as all other command-line QDS tools - Extensible with clear separation of interfaces and implementation • Cons - Verbose, lots of code to do simple things - Error-prone (easy to get wrong and to introduce subtle bugs) • Everybody needs Quote, Trade, etc with easy-to-use API - Hence, dxFeed API was born
  • 45. dxFeed API print quote bid/ask on the screen
  • 46. Contact me by email: elizarov at devexperts.com