This is a presentation that was held at the 38th Conference on Very Large Databases (VLDB), 2012.
Full paper and additional information available at:
http://msrg.org/papers/vldb12-bigdata
Abstract:
As the complexity of enterprise systems increases, the need for monitoring and analyzing such systems also grows. A number of companies have built sophisticated monitoring tools that go far beyond simple resource utilization reports. For example, based on instrumentation and specialized APIs, it is now possible to monitor single method invocations and trace individual transactions across geographically distributed systems. This high-level of detail enables more precise forms of analysis and prediction but comes at the price of high data rates (i.e., big data). To maximize the benefit of data monitoring, the data has to be stored for an extended period of time for ulterior analysis. This new wave of big data analytics imposes new challenges especially for the application performance monitoring systems. The monitoring data has to be stored in a system that can sustain the high data rates and at the same time enable an up-to-date view of the underlying infrastructure. With the advent of modern key-value stores, a variety of data storage systems have emerged that are built with a focus on scalability and high data rates as predominant in this monitoring use case.
In this work, we present our experience and a comprehensive performance evaluation of six modern (open-source) data stores in the context of application performance monitoring as part of CA Technologies initiative. We evaluated these systems with data and workloads that can be found in application performance monitoring, as well as, on-line advertisement, power monitoring, and many other use cases. We present our insights not only as performance results but also as lessons learned and our experience relating to the setup and configuration complexity of these data stores in an industry setting.
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Solving Big Data Challenges for Enterprise Application Performance Management
1. MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Solving Big Data Challenges for
Enterprise Application
Performance Management
Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen
University of Toronto
Sergio Gomez-Villamor
Universitat Politecnica de Catalunya
Victor Muntes-Mulero*, Serge Mankowskii
CA Labs (Europe*)
4. Enterprise System Architecture
SAP
Agent
Client
Client
Client
Identity
Manager
Agent
Web Server
Agent
Application
Server
Agent
Application
Server
Agent
Message
Queue
Database
Agent
Message
Broker
Agent
Agent
Web
Service
Agent
Application
Server
Client
Main Frame
3rd Party
Database
Agent
Measurements
Agent
Agent
Tilmann Rabl - VLDB 2012
Agent
8/26/2012
6. APM in Numbers
Nodes in an enterprise
system
10 sec
100B / event
Raw data
Up to 50K
Avg 10K
Reporting period
Data size
Metrics per node
100 – 10K
100MB / sec
355GB / h
2.8 PB / y
Event rate at storage
> 1M / sec
Tilmann Rabl - VLDB 2012
8/24/2012
7. APM Benchmark
Based on Yahoo! Cloud Serving Benchmark (YCSB)
Single table
CRUD operations
25 byte key
5 values (10 byte each)
75 byte / record
Five workloads
50 rows scan length
Tilmann Rabl - VLDB 2012
8/24/2012
10. Experimental Testbed
Two clusters
Cluster M (memory-bound)
16 nodes (plus master node)
2x quad core CPU,16 GB RAM, 2x 74GB HDD (RAID 0)
10 million records per node (~700 MB raw)
128 connections per node (8 per core)
Cluster D (disk-bound)
24 nodes
2x dual core CPU, 4 GB RAM, 74 GB HDD
150 million records on 12 nodes (~10.5 GB raw)
8 connections per node
Tilmann Rabl - VLDB 2012
8/24/2012
11. Evaluation
Minimum 3 runs per workload and system
Fresh install for each run
10 minutes runtime
Up to 5 clients for 12 nodes
To make sure YCSB is no bottleneck
Maximum achievable throughput
Tilmann Rabl - VLDB 2012
8/24/2012
12. Workload W - Throughput
Cassandra dominates
Higher throughput for Cassandra and HBase
Lower throughput for other systems
Scalability not as good for all web stores
VoltDB best single node throughput
Tilmann Rabl - VLDB 2012
8/24/2012
13. Workload W – Latency Writes
Same latencies for
Cassandra,Voldemort,VoltDB,Redis, MySQL
HBase latency increased
Tilmann Rabl - VLDB 2012
8/24/2012
14. Workload W – Latency Reads
Same latency as for R for
Cassandra,Voldemort, Redis,VoltDB, HBase
HBase latency in second range
Tilmann Rabl - VLDB 2012
8/24/2012
15. Workload R - Throughput
95% reads, 5% writes
On a single node, main memory systems have best
performance
Linear scalability for web data stores
Sublinear scalability for sharded systems
Slow-down for VoltDB
Tilmann Rabl - VLDB 2012
8/24/2012
16. Workload RW - Throughput
50% reads, 50% writes
VoltDB highest single node throughput
HBase and Cassandra throughput increase
MySQL and Voldemort throughput reduction
Tilmann Rabl - VLDB 2012
8/24/2012
17. Workload RS – Latency Scans
HBase latency equal to reads in Workload R
MySQL latency very high due to full table scans
Similar but increased latency for
Cassandra, Redis,VoltDB
Tilmann Rabl - VLDB 2012
8/24/2012
19. Bounded Throughput –
Write Latency
Workload R, normalized
Maximum throughput in previous tests 100%
Steadily decreasing for most systems
HBase not as stable (but below 0.1 ms)
Tilmann Rabl - VLDB 2012
8/24/2012
20. Bounded Throughput –
Read Latency
Redis and Voldemort slightly decrease
HBase two states
Cassandra decreases linearly
MySQL decreases and then stays constant
Tilmann Rabl - VLDB 2012
8/24/2012
21. Disk Usage
Raw data: 75 byte per record, 0.7GB/10M
Cassandra: 2.5GB / 10M
HBase: 7.5GB / 10M
No compression
Tilmann Rabl - VLDB 2012
8/24/2012
22. Cluster D Results – Throughput
Disk bound, 150M records on 8 nodes
More writes – more throughput
Cassandra: 26x
Hbase: 15x
Voldemort 3x
Tilmann Rabl - VLDB 2012
8/24/2012
23. Cluster D Results – Latency Writes
Low latency for writes for all systems
Relatively stable latency for writes
Lowest for HBase
Equal for Voldemort and Cassandra
Tilmann Rabl - VLDB 2012
8/24/2012
24. Cluster D Results – Latency Reads
Cassandra latency decreases with more writes
Voldemort latency low 5-20ms
HBase latency high (up to 200 ms)
Cassandra in between
Tilmann Rabl - VLDB 2012
8/24/2012
25. Lessons Learned
YCSB
Cassandra
Higher number of connections might lead to better results
MySQL
Jedis distribution uneven
Voldemort
Difficult setup, special JVM settings necessary
Redis
Optimal tokens for data distribution necessary
HBase
Client to host ratio 1:3 better 1:2
Eager scan evaluation
VoltDB
Synchronous communication slow on multiple nodes
Tilmann Rabl - VLDB 2012
8/24/2012
26. Conclusions I
Cassandra
HBase
Read and write latency stable at low level
Redis
Scales well, latency decreases with higher scales
Voldemort
Low write latency, high read latency, low throughput
Sharded MySQL
Winner in terms of scalability and throughput
Standalone has high throughput, sharded version does not scale well
VoltDB
High single node throughput, does not scale for synchronous access
Tilmann Rabl - VLDB 2012
8/24/2012
27. Conclusions II
Cassandra’s performance close to APM requirements
Additional improvements needed for reliably sustaining APM
workload
Future work
Benchmark impact of replication, compression
Monitor the benchmark runs using APM tool
Tilmann Rabl - VLDB 2012
8/26/2012