2. YCSB Benchmark
Yahoo! Cloud Service Benchmark
De-facto cloud benchmark
Benchmark can not be changed
NDB is #1 player in this realm
NDB Cluster is the Fastest Distributed, In-memory,
Transactional Database in the world!
3. • NDB Cluster 7.6.10 on YCSB benchmark
• 50/50 read/write runs, Workload A, 10 Fields, 1kB Rows, uniform
• 12 BM.DenseIO and 30 DM.Standard
“old” X5 1.36 with 36 cores, Hyper Threading and 512MB RAM
DenseIO have NVMe drives
Oracle Linux 7
instances evenly across 3 AD (4 + 10 per domain)
Setting
4. • 1 Data Node per DenseIO instance (52 cores, 8 NVMMe drives)
2 and 4 Data Node clusters in same Availability Domain
8 Data Node clusters split across 2 Availability Domains
Node Groups split across Availability Domains
• 2 MySQL Server per BM36.Standard instance
One MySQL Server per CPU Socket/Numa node (36 CPUs / socket)
Per MySQL Servers 1 YCSB clients co-located
locked onto same Socket/Numa Node
Benchmark setup
5. Benchmark runs
• YCSB with 1kB Rows, 10 Fields with 100 Bytes each
• Most runs with 10M and 100M rows due to time constraints
Using varying number of threads and clients
Testing throughput versus latency
• Few runs with 300M and 600M rows
• Different setups with
Varying number of YCSB/MySQL Server pair counts
Data nodes using 8, 16 and 32 data manager (LDM) threads
8 and 16 LDM running on 1 NUMA node
NUMA off (memory allocated on local NUMA node)
32 LDM on both sockets, NUMA memory interlaced in these cases
6. YCSB and MySQL Cluster 7.6 set-up
MySQL Server on BM.Standard
2 Server instances per host
Data Nodes on DenseIO
full duplication of data, 2 replicas
strong consistent across both replicas
ACID (read committed)
YCSB
JDBC driver, standard SQL used
competitors use ClusterJ-ish NoSQL API
unmodified downloaded binaries version 0.15.0, co-located with
MySQL Server
1k byte rows, 10 columns (default config), uniform distribution
YCSB
JDBC
YCSB
JDBC
NUMA0 NUMA1
BM36.Standard instance
YCSB
JDBC
YCSB
JDBC
NUMA0 NUMA1
BM36.Standard instance
…
BM.DenseIO instances, 1 data node / instance
9. Scaling load
4 Data Nodes, optimize NDB for Throughput or Latency by adopting load generators
Configuration
(threads per client)
600M rows
64 threads x 20 clients
600M rows
256 threads x 20 clients
95th %tile Read Latency 0.8 ms 1.8 ms
99th %tile Read Latency 0.9 ms 2.4 ms
95th %tile Update Latency 1.8 ms 3.2 ms
99th %tile Update Latency 1.9 ms 3.9 ms
Throughput Ops/s 1.3M 2.9M
1M
2M
3M
Transactionpersecond
Latency vs Throughput
2 ms
4 ms
10. Scaling number of rows
4 Data Nodes, number of rows in cluster has no
performance impact!
Configuration
(threads per client)
300M rows
128 threads x 10 clients
600M rows
128 threads x 10 clients
95th %tile Read Latency 0.9 ms 0.9 ms
99th %tile Read Latency 1 ms 1 ms
95th %tile Update Latency 1.7 ms 1.7 ms
99th %tile Update Latency 2 ms 2 ms
Throughput Ops/s 1.26M 1.25M
1M
2M
3M
Transactionpersecond
2 ms
4 ms
Same Throughput & Latency
11. Number of LDM - 16 LDM versus 32 LDM
• Higher number of LDM threads will improve latency and scalability, but it can lower
throughput when too many clients are used
Load configuration
client x threads
Low Load / fewer clients
10 x 128
Low load / many clients
20 x 64
High Load
20 x 192
LDMs
16 32 16 32 16 32
Avg Read Latency (ms) 0.6 0.6 0.7 0.6 0.9 0.9
95th %tile Read Latency (ms) 0.9 0.9 1.3 0.8 1.6 1.3
99th %tile Read Latency (ms) 1.1 1.1 1.8 0.9 3.5 1.5
Avg Upd Latency (ms) 1.4 1.4 2.1 1.4 3.8 2.2
95th %tile Upd Latency (ms) 1.7 1.7 3.4 1.8 6.3 2.7
99th %tile Upd Latency (ms) 2 2 4.9 1.9 71 (!) 3.1
Throughput Ops/s 1.26M 1.25M 1.78M 1.3M 1.64M 2.5M
13. • During benchmark runs and experiments NDB’s performance
demonstrated robustness against many “misconfigurations” in
the OS (interrupts, network, NUMA, etc.)
• VARCHAR columns are 4x “faster” than BLOBs in this YCSB
variant
Outlook
15. Conclusions
• Number of rows has no impact on performance
Performance instead depends on
Number of client threads
Number of Data Nodes
Number of LDM threads per Data Node
System can be optimised easily for latency versus throughput
• Cluster scales well with number of data nodes
splitting node groups across AD impact performance
• NDB is the fastest transactional distributed in-memory database in the world