SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
Filtering 100M objects

What can go wrong?

Alexey Ragozin
alexey.ragozin@gmail.com
Dec 2013
Problem description
•
•
•
•

100M object (50M was tested)
~ 100 fields per object
~ 1kb per object (ProtoBuf binary format)
Simple queries
select … where … order by … [limit N]

• Expected query result set – 200k
• Max query result set – 50% of all data
Problem description
•
•
•
•

100M object (50M was tested)
Object size is Ok
~ 100 fields per object
~ 1kb per object (ProtoBuf binary format)
Simple queries
select … where … order by … [limit N]

Inline with
• Expected query result set – 200k Coherence filters
• Max query result set – 50% of all data
Problem description
•
•
•
•

100M object (50M was tested)
~ 100 fields per object
~ 1kb per object (ProtoBuf binary format)
Simple queries
select … where … order by … [limit N]

• Expected query result set – 200k
Challenge
• Max query result set – 50% of all data
Problem description
•
•
•
•

100M object (50M was tested)
~ 100 fields per object
~ 1kb per object (ProtoBuf binary format)
Simple queries
select … where … order by … [limit N]

• Expected query result set – 200k
• Max query result set – 50% of all data
Challenge
Real challenge
Big result set problem
Calling NamedCache method
 Single TCMP message to each participating member
 Processing of on remote member
 Single TCMP result message from each member
 Aggregation all results in caller JVM
Return form method

Huge TCMP message = cluster crash
Naive strategy
Processing of query
 Send aggregator with filter, retrieve
 Keys
 Field for sorting

 Sort whole result set (keys + few fields)
 Apply limit
 Retrieve and send objects in fixed batches
Testing …
OutOfMemoryError on storage node
 Storage node processing filter
Deserialized
(600K objects per node)
value is there
Deserialize value, apply filter, match …
… retain entry until … (end of filtering ?)
 Filter processing may take few seconds
 There could be few concurrent queries
Solving …
Using indexes
 Index only filter does not deserialize object
 We cannot index everything
 Single unindexed predicate would call deserialization

Special filter to cut deserialized object reference
 We do not need object (aggregator extracts from binary)
 Desirialized object now collected in young space
 Synthetic wrapper object + messing with serialization
Testing …
Very high memory usage on service node
Collecting and sorting large result set
 Have to use huge young space (8Gib)
 Query concurrency in limited by memory
Single threaded sorting
 It is very fast tough
Indexes and attribute cardinality
“Status” attribute – 90% of objects are OPEN
5.69
5.78

No index

0.066
0.066

Indexed: ticker

0.01

Indexed: both

0.496

0.018
0.017

Indexed: both + noIndex hint

0

1

2

ticker & side

3

4

5

6

7

side & ticker

http://blog.ragozin.info/2013/07/coherence-101-filters-performance-and.html
Indexes and attribute cardinality
Possible strategies to remedy
 Transform query
 Wrap “bad” predicates into NoIndexFilter
 Fix filter execution “planner”
Indexes and attribute cardinality
Can we go without indexes?
 Full scan 50M – 80 cores, 3 servers
 30 seconds
 Too slow!
Naive strategy
Almost good enough
Problems with naive strategy
 Big memory problems on service process
 Max result set size is limited
 No control on max TCMP packet size
 Indexes may be inefficient
Incremental retrieval
 Result set is always sorted
 Primary key is always last sort attribute
 Aggregator on invocation
 Sort its partial result set
Selects first N
 Return N references (key + sort attribute)
 Return remaining size of each partition
Incremental retrieval

P1
P2
P3
P4

Sort order
Incremental retrieval

P1
P2
P3
P4

Sort order
Incremental retrieval

P1
P2
P3
P4

Sort order
Incremental retrieval

P1
P2

Partition excluded

P3
P4

Sort order
Incremental retrieval

P1
P2
P3
P4

Sort order
Incremental retrieval
Advantages
 Size of TCMP packets is under control
 Reduced traffic for LIMIT queries
 Fixed memory requirements for service node

Partial retrieval limit
 Target result set – 200k
 80 nodes
 Best performance with ~1500 limit
Incremental retrieval
A little nuance …
 Filter based aggregator is executed by one thread
 How many times aggregate(…) method would be called?
 Once
 Twice

 Once per partition
 Other

Coherence limits amount of data passed to aggregate(…)
based on binary side of data.
Incremental retrieval
What about snapshot consistency?
 There were no consistency to begin with
 No consistency between nodes
 Index updates are not transactional

But we need result set of query to be consistent!
 Hand made MVCC
 If you REALLY, REALLY, REALLY need it
Hand made MVCC
Synthetic key to have multiple versions in cache
Data affinity to exploit partition level consistency
Timestamp based surface – consistent snapshot

if timestamp is a part of key
IndexAwareFilter can be used (without an index)

otherwise
TimeSeriesIndex -

https://github.com/gridkit/coherence-search-timeseries
Time series index
Special index for managing versioned data
Entry key

Series key

Entry value

Entry id

Timestamp

Payload

Cache entry

Getting last version for series k
select * from versions where series=k and version =

(select max(version) from versions where key=k)
https://github.com/gridkit/coherence-search-timeseries
Time series index
Series
inverted index

HASHTABLE

Timestamp

ER

Series key

D
OR

Series key

Timestamp inverted subindex

Timestamp
Timestamp

Entry ref
Entry ref
Entry ref

Series key
Series key
Series key

https://github.com/gridkit/coherence-search-timeseries
TCMP vs TCP
•
•
•
•

TCP
WAN networks
Slow start
Sliding window
Timeout packet loss
detection

Fair network sharing

•
•
•
•

TCMP
Single switch networks
Fast NACKs
Loss detection by packet order
Per packet resend

Low latency communications
Bandwidth maximization
TCMP vs TCP
In bandwidth completion
TCP doesn’t have a chance against TCMP
Having TCP and TCMP in one network
 Normally TCMP is limited by proxy speaking TCP
 Traffic amplification effects (TCMP traffic >> TCP traffic)
 Bandwidth strangled TCP becomes unstable
 Hanging for few seconds (retransmit timeouts)
 Spurious connection resets

Keep TCMP in separate switch if possible!
http://blog.ragozin.info/2013/09/coherence-101-entryprocessor-traffic.html
Bonus: ProtoBuf extractor
Inspired by POF extractor







Extracts fields for binary data
Does not require generated classes or IDL
Use field IDs to navigate data
XPath like expressiveness (i.e. extract from map by key)
Can processes any number of extractors in single pass
Apache 2.0 licensed
https://github.com/gridkit/binary-extractors
Bonus: SJK diagnostic tool
SJK – CLI tool exploiting JVM diagnostic interfaces





Connect to JVM by PID
Display thread CPU usage in real time (like top)
Display per thread memory allocation rate
Dead objects histogram
… and more
https://github.com/aragozin/jvm-tools
Thank you
http://blog.ragozin.info
- my articles
http://code.google.com/p/gridkit
http://github.com/gridkit
- my open source code
http://aragozin.timepad.ru
- tech meetups in Moscow

Alexey Ragozin
alexey.ragozin@gmail.com

Mais conteúdo relacionado

Mais procurados

使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡Lawrence Huang
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...Lucidworks
 
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files SyndromeDegrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files SyndromeDatabricks
 
Dive into spark2
Dive into spark2Dive into spark2
Dive into spark2Gal Marder
 
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...DataStax
 
Distributed Applications with Apache Zookeeper
Distributed Applications with Apache ZookeeperDistributed Applications with Apache Zookeeper
Distributed Applications with Apache ZookeeperAlex Ehrnschwender
 
Evolving Streaming Applications
Evolving Streaming ApplicationsEvolving Streaming Applications
Evolving Streaming ApplicationsDataWorks Summit
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
 
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...Flink Forward
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Internship final report@Treasure Data Inc.
Internship final report@Treasure Data Inc.Internship final report@Treasure Data Inc.
Internship final report@Treasure Data Inc.Ryuichi ITO
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
Spark real world use cases and optimizations
Spark real world use cases and optimizationsSpark real world use cases and optimizations
Spark real world use cases and optimizationsGal Marder
 
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...DataStax
 
JVM languages "flame wars"
JVM languages "flame wars"JVM languages "flame wars"
JVM languages "flame wars"Gal Marder
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with TransformersDatabricks
 
So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier Hakka Labs
 
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis  Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis Apache Apex
 
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...Databricks
 

Mais procurados (20)

使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡
 
Javantura v3 - Going Reactive with RxJava – Hrvoje Crnjak
Javantura v3 - Going Reactive with RxJava – Hrvoje CrnjakJavantura v3 - Going Reactive with RxJava – Hrvoje Crnjak
Javantura v3 - Going Reactive with RxJava – Hrvoje Crnjak
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files SyndromeDegrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files Syndrome
 
Dive into spark2
Dive into spark2Dive into spark2
Dive into spark2
 
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
 
Distributed Applications with Apache Zookeeper
Distributed Applications with Apache ZookeeperDistributed Applications with Apache Zookeeper
Distributed Applications with Apache Zookeeper
 
Evolving Streaming Applications
Evolving Streaming ApplicationsEvolving Streaming Applications
Evolving Streaming Applications
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Internship final report@Treasure Data Inc.
Internship final report@Treasure Data Inc.Internship final report@Treasure Data Inc.
Internship final report@Treasure Data Inc.
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Spark real world use cases and optimizations
Spark real world use cases and optimizationsSpark real world use cases and optimizations
Spark real world use cases and optimizations
 
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
 
JVM languages "flame wars"
JVM languages "flame wars"JVM languages "flame wars"
JVM languages "flame wars"
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 
So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier
 
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis  Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis
 
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
 

Destaque

Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVMaragozin
 
Java GC tuning and monitoring (by Alexander Ashitkin)
Java GC tuning and monitoring (by Alexander Ashitkin)Java GC tuning and monitoring (by Alexander Ashitkin)
Java GC tuning and monitoring (by Alexander Ashitkin)aragozin
 
Virtualizing Java in Java (jug.ru)
Virtualizing Java in Java (jug.ru)Virtualizing Java in Java (jug.ru)
Virtualizing Java in Java (jug.ru)aragozin
 
Performance Test Driven Development (CEE SERC 2013 Moscow)
Performance Test Driven Development (CEE SERC 2013 Moscow)Performance Test Driven Development (CEE SERC 2013 Moscow)
Performance Test Driven Development (CEE SERC 2013 Moscow)aragozin
 
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)aragozin
 
Cборка мусора в Java без пауз (HighLoad++ 2013)
Cборка мусора в Java без пауз  (HighLoad++ 2013)Cборка мусора в Java без пауз  (HighLoad++ 2013)
Cборка мусора в Java без пауз (HighLoad++ 2013)aragozin
 

Destaque (6)

Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVM
 
Java GC tuning and monitoring (by Alexander Ashitkin)
Java GC tuning and monitoring (by Alexander Ashitkin)Java GC tuning and monitoring (by Alexander Ashitkin)
Java GC tuning and monitoring (by Alexander Ashitkin)
 
Virtualizing Java in Java (jug.ru)
Virtualizing Java in Java (jug.ru)Virtualizing Java in Java (jug.ru)
Virtualizing Java in Java (jug.ru)
 
Performance Test Driven Development (CEE SERC 2013 Moscow)
Performance Test Driven Development (CEE SERC 2013 Moscow)Performance Test Driven Development (CEE SERC 2013 Moscow)
Performance Test Driven Development (CEE SERC 2013 Moscow)
 
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
 
Cборка мусора в Java без пауз (HighLoad++ 2013)
Cборка мусора в Java без пауз  (HighLoad++ 2013)Cборка мусора в Java без пауз  (HighLoad++ 2013)
Cборка мусора в Java без пауз (HighLoad++ 2013)
 

Semelhante a Filtering 100M objects in Coherence cache. What can go wrong?

Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbZhangZhengming
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceESUG
 
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineDataWorks Summit
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Gruter
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Seunghyun Lee
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
Faceting optimizations for Solr
Faceting optimizations for SolrFaceting optimizations for Solr
Faceting optimizations for SolrToke Eskildsen
 
An Empirical Evaluation of RDF Graph Partitioning Techniques
An Empirical Evaluation of RDF Graph Partitioning TechniquesAn Empirical Evaluation of RDF Graph Partitioning Techniques
An Empirical Evaluation of RDF Graph Partitioning TechniquesAdnan Akhter
 
Impatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics
Impatience is a Virtue: Revisiting Disorder in High-Performance Log AnalyticsImpatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics
Impatience is a Virtue: Revisiting Disorder in High-Performance Log AnalyticsBadrish Chandramouli
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueDatabricks
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBJason Terpko
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartLucidworks
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 
Managing data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBManaging data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBAntonios Giannopoulos
 

Semelhante a Filtering 100M objects in Coherence cache. What can go wrong? (20)

Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performance
 
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Faceting optimizations for Solr
Faceting optimizations for SolrFaceting optimizations for Solr
Faceting optimizations for Solr
 
An Empirical Evaluation of RDF Graph Partitioning Techniques
An Empirical Evaluation of RDF Graph Partitioning TechniquesAn Empirical Evaluation of RDF Graph Partitioning Techniques
An Empirical Evaluation of RDF Graph Partitioning Techniques
 
AWS Data Collection & Storage
AWS Data Collection & StorageAWS Data Collection & Storage
AWS Data Collection & Storage
 
Impatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics
Impatience is a Virtue: Revisiting Disorder in High-Performance Log AnalyticsImpatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics
Impatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Data Collection and Storage
Data Collection and StorageData Collection and Storage
Data Collection and Storage
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
Managing data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBManaging data and operation distribution in MongoDB
Managing data and operation distribution in MongoDB
 

Mais de aragozin

Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and opsaragozin
 
I know why your Java is slow
I know why your Java is slowI know why your Java is slow
I know why your Java is slowaragozin
 
Java profiling Do It Yourself (jug.msk.ru 2016)
Java profiling Do It Yourself (jug.msk.ru 2016)Java profiling Do It Yourself (jug.msk.ru 2016)
Java profiling Do It Yourself (jug.msk.ru 2016)aragozin
 
Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016aragozin
 
Распределённое нагрузочное тестирование на Java
Распределённое нагрузочное тестирование на JavaРаспределённое нагрузочное тестирование на Java
Распределённое нагрузочное тестирование на Javaaragozin
 
What every Java developer should know about network?
What every Java developer should know about network?What every Java developer should know about network?
What every Java developer should know about network?aragozin
 
Java profiling Do It Yourself
Java profiling Do It YourselfJava profiling Do It Yourself
Java profiling Do It Yourselfaragozin
 
DIY Java Profiler
DIY Java ProfilerDIY Java Profiler
DIY Java Profileraragozin
 
Java black box profiling
Java black box profilingJava black box profiling
Java black box profilingaragozin
 
Блеск и нищета распределённых кэшей
Блеск и нищета распределённых кэшейБлеск и нищета распределённых кэшей
Блеск и нищета распределённых кэшейaragozin
 
JIT compilation in modern platforms – challenges and solutions
JIT compilation in modern platforms – challenges and solutionsJIT compilation in modern platforms – challenges and solutions
JIT compilation in modern platforms – challenges and solutionsaragozin
 
Casual mass parallel computing
Casual mass parallel computingCasual mass parallel computing
Casual mass parallel computingaragozin
 
Nanocloud cloud scale jvm
Nanocloud   cloud scale jvmNanocloud   cloud scale jvm
Nanocloud cloud scale jvmaragozin
 
Борьба с GС паузами в JVM
Борьба с GС паузами в JVMБорьба с GС паузами в JVM
Борьба с GС паузами в JVMaragozin
 
Распределённый кэш или хранилище данных. Что выбрать?
Распределённый кэш или хранилище данных. Что выбрать?Распределённый кэш или хранилище данных. Что выбрать?
Распределённый кэш или хранилище данных. Что выбрать?aragozin
 
Devirtualization of method calls
Devirtualization of method callsDevirtualization of method calls
Devirtualization of method callsaragozin
 
Tech talk network - friend or foe
Tech talk   network - friend or foeTech talk   network - friend or foe
Tech talk network - friend or foearagozin
 
Database backed coherence cache
Database backed coherence cacheDatabase backed coherence cache
Database backed coherence cachearagozin
 
ORM and distributed caching
ORM and distributed cachingORM and distributed caching
ORM and distributed cachingaragozin
 
Секреты сборки мусора в Java [DUMP-IT 2012]
Секреты сборки мусора в Java [DUMP-IT 2012]Секреты сборки мусора в Java [DUMP-IT 2012]
Секреты сборки мусора в Java [DUMP-IT 2012]aragozin
 

Mais de aragozin (20)

Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and ops
 
I know why your Java is slow
I know why your Java is slowI know why your Java is slow
I know why your Java is slow
 
Java profiling Do It Yourself (jug.msk.ru 2016)
Java profiling Do It Yourself (jug.msk.ru 2016)Java profiling Do It Yourself (jug.msk.ru 2016)
Java profiling Do It Yourself (jug.msk.ru 2016)
 
Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016
 
Распределённое нагрузочное тестирование на Java
Распределённое нагрузочное тестирование на JavaРаспределённое нагрузочное тестирование на Java
Распределённое нагрузочное тестирование на Java
 
What every Java developer should know about network?
What every Java developer should know about network?What every Java developer should know about network?
What every Java developer should know about network?
 
Java profiling Do It Yourself
Java profiling Do It YourselfJava profiling Do It Yourself
Java profiling Do It Yourself
 
DIY Java Profiler
DIY Java ProfilerDIY Java Profiler
DIY Java Profiler
 
Java black box profiling
Java black box profilingJava black box profiling
Java black box profiling
 
Блеск и нищета распределённых кэшей
Блеск и нищета распределённых кэшейБлеск и нищета распределённых кэшей
Блеск и нищета распределённых кэшей
 
JIT compilation in modern platforms – challenges and solutions
JIT compilation in modern platforms – challenges and solutionsJIT compilation in modern platforms – challenges and solutions
JIT compilation in modern platforms – challenges and solutions
 
Casual mass parallel computing
Casual mass parallel computingCasual mass parallel computing
Casual mass parallel computing
 
Nanocloud cloud scale jvm
Nanocloud   cloud scale jvmNanocloud   cloud scale jvm
Nanocloud cloud scale jvm
 
Борьба с GС паузами в JVM
Борьба с GС паузами в JVMБорьба с GС паузами в JVM
Борьба с GС паузами в JVM
 
Распределённый кэш или хранилище данных. Что выбрать?
Распределённый кэш или хранилище данных. Что выбрать?Распределённый кэш или хранилище данных. Что выбрать?
Распределённый кэш или хранилище данных. Что выбрать?
 
Devirtualization of method calls
Devirtualization of method callsDevirtualization of method calls
Devirtualization of method calls
 
Tech talk network - friend or foe
Tech talk   network - friend or foeTech talk   network - friend or foe
Tech talk network - friend or foe
 
Database backed coherence cache
Database backed coherence cacheDatabase backed coherence cache
Database backed coherence cache
 
ORM and distributed caching
ORM and distributed cachingORM and distributed caching
ORM and distributed caching
 
Секреты сборки мусора в Java [DUMP-IT 2012]
Секреты сборки мусора в Java [DUMP-IT 2012]Секреты сборки мусора в Java [DUMP-IT 2012]
Секреты сборки мусора в Java [DUMP-IT 2012]
 

Último

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Último (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Filtering 100M objects in Coherence cache. What can go wrong?

  • 1. Filtering 100M objects What can go wrong? Alexey Ragozin alexey.ragozin@gmail.com Dec 2013
  • 2. Problem description • • • • 100M object (50M was tested) ~ 100 fields per object ~ 1kb per object (ProtoBuf binary format) Simple queries select … where … order by … [limit N] • Expected query result set – 200k • Max query result set – 50% of all data
  • 3. Problem description • • • • 100M object (50M was tested) Object size is Ok ~ 100 fields per object ~ 1kb per object (ProtoBuf binary format) Simple queries select … where … order by … [limit N] Inline with • Expected query result set – 200k Coherence filters • Max query result set – 50% of all data
  • 4. Problem description • • • • 100M object (50M was tested) ~ 100 fields per object ~ 1kb per object (ProtoBuf binary format) Simple queries select … where … order by … [limit N] • Expected query result set – 200k Challenge • Max query result set – 50% of all data
  • 5. Problem description • • • • 100M object (50M was tested) ~ 100 fields per object ~ 1kb per object (ProtoBuf binary format) Simple queries select … where … order by … [limit N] • Expected query result set – 200k • Max query result set – 50% of all data Challenge Real challenge
  • 6. Big result set problem Calling NamedCache method  Single TCMP message to each participating member  Processing of on remote member  Single TCMP result message from each member  Aggregation all results in caller JVM Return form method Huge TCMP message = cluster crash
  • 7. Naive strategy Processing of query  Send aggregator with filter, retrieve  Keys  Field for sorting  Sort whole result set (keys + few fields)  Apply limit  Retrieve and send objects in fixed batches
  • 8. Testing … OutOfMemoryError on storage node  Storage node processing filter Deserialized (600K objects per node) value is there Deserialize value, apply filter, match … … retain entry until … (end of filtering ?)  Filter processing may take few seconds  There could be few concurrent queries
  • 9. Solving … Using indexes  Index only filter does not deserialize object  We cannot index everything  Single unindexed predicate would call deserialization Special filter to cut deserialized object reference  We do not need object (aggregator extracts from binary)  Desirialized object now collected in young space  Synthetic wrapper object + messing with serialization
  • 10. Testing … Very high memory usage on service node Collecting and sorting large result set  Have to use huge young space (8Gib)  Query concurrency in limited by memory Single threaded sorting  It is very fast tough
  • 11. Indexes and attribute cardinality “Status” attribute – 90% of objects are OPEN 5.69 5.78 No index 0.066 0.066 Indexed: ticker 0.01 Indexed: both 0.496 0.018 0.017 Indexed: both + noIndex hint 0 1 2 ticker & side 3 4 5 6 7 side & ticker http://blog.ragozin.info/2013/07/coherence-101-filters-performance-and.html
  • 12. Indexes and attribute cardinality Possible strategies to remedy  Transform query  Wrap “bad” predicates into NoIndexFilter  Fix filter execution “planner”
  • 13. Indexes and attribute cardinality Can we go without indexes?  Full scan 50M – 80 cores, 3 servers  30 seconds  Too slow!
  • 14. Naive strategy Almost good enough Problems with naive strategy  Big memory problems on service process  Max result set size is limited  No control on max TCMP packet size  Indexes may be inefficient
  • 15. Incremental retrieval  Result set is always sorted  Primary key is always last sort attribute  Aggregator on invocation  Sort its partial result set Selects first N  Return N references (key + sort attribute)  Return remaining size of each partition
  • 21. Incremental retrieval Advantages  Size of TCMP packets is under control  Reduced traffic for LIMIT queries  Fixed memory requirements for service node Partial retrieval limit  Target result set – 200k  80 nodes  Best performance with ~1500 limit
  • 22. Incremental retrieval A little nuance …  Filter based aggregator is executed by one thread  How many times aggregate(…) method would be called?  Once  Twice  Once per partition  Other Coherence limits amount of data passed to aggregate(…) based on binary side of data.
  • 23. Incremental retrieval What about snapshot consistency?  There were no consistency to begin with  No consistency between nodes  Index updates are not transactional But we need result set of query to be consistent!  Hand made MVCC  If you REALLY, REALLY, REALLY need it
  • 24. Hand made MVCC Synthetic key to have multiple versions in cache Data affinity to exploit partition level consistency Timestamp based surface – consistent snapshot if timestamp is a part of key IndexAwareFilter can be used (without an index) otherwise TimeSeriesIndex - https://github.com/gridkit/coherence-search-timeseries
  • 25. Time series index Special index for managing versioned data Entry key Series key Entry value Entry id Timestamp Payload Cache entry Getting last version for series k select * from versions where series=k and version = (select max(version) from versions where key=k) https://github.com/gridkit/coherence-search-timeseries
  • 26. Time series index Series inverted index HASHTABLE Timestamp ER Series key D OR Series key Timestamp inverted subindex Timestamp Timestamp Entry ref Entry ref Entry ref Series key Series key Series key https://github.com/gridkit/coherence-search-timeseries
  • 27. TCMP vs TCP • • • • TCP WAN networks Slow start Sliding window Timeout packet loss detection Fair network sharing • • • • TCMP Single switch networks Fast NACKs Loss detection by packet order Per packet resend Low latency communications Bandwidth maximization
  • 28. TCMP vs TCP In bandwidth completion TCP doesn’t have a chance against TCMP Having TCP and TCMP in one network  Normally TCMP is limited by proxy speaking TCP  Traffic amplification effects (TCMP traffic >> TCP traffic)  Bandwidth strangled TCP becomes unstable  Hanging for few seconds (retransmit timeouts)  Spurious connection resets Keep TCMP in separate switch if possible! http://blog.ragozin.info/2013/09/coherence-101-entryprocessor-traffic.html
  • 29. Bonus: ProtoBuf extractor Inspired by POF extractor       Extracts fields for binary data Does not require generated classes or IDL Use field IDs to navigate data XPath like expressiveness (i.e. extract from map by key) Can processes any number of extractors in single pass Apache 2.0 licensed https://github.com/gridkit/binary-extractors
  • 30. Bonus: SJK diagnostic tool SJK – CLI tool exploiting JVM diagnostic interfaces     Connect to JVM by PID Display thread CPU usage in real time (like top) Display per thread memory allocation rate Dead objects histogram … and more https://github.com/aragozin/jvm-tools
  • 31. Thank you http://blog.ragozin.info - my articles http://code.google.com/p/gridkit http://github.com/gridkit - my open source code http://aragozin.timepad.ru - tech meetups in Moscow Alexey Ragozin alexey.ragozin@gmail.com