Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Adventures in RDS Load Testing
1. Adventures in RDS Load
Testing
Mike Harnish, KSM Technology Partners LLC
2. Objectives
Empirical basis for evaluation
Of RDS as a platform for future development
Of performance of different configurations
Platform for future load testing
Of different configurations, schemas, and load profiles
Not strictly scientific
Did not try to isolate all possible sources of variability
Not benchmarking
Not exhaustive
Some configurations not tested
3. Why RDS? Why Oracle?
Why not DynamoDB/NoSQL?
Nothing at all against them
Testing platform design does not exclude them
Why not MySQL/SQLServer?
Ran out of time
Why not PostgreSQL?
Ran out of time, but would be my next choice
RDBMS migration path
4. How We Tested
Provision RDS servers
Generate test data
Introduce distributed load
Persistent and relentless
Rough-grained “batches” of work
For a finite number of transactions
Monitor servers
With Cloudwatch
Analyze per-batch statistics
5. RDS Server Configurations
db.m2.4xlarge
High-Memory Quadruple Extra Large DB Instance: 68 GB of
memory, 26 ECUs (8 virtual cores with 3.25 ECUs each), 64-bit
platform, High I/O Capacity, Provisioned IOPS Optimized:
1000Mbps
At 3000 and 1000 PIOPS
$3.14 base/hour, Oracle license included
The largest supported instance type for Oracle
db.m1.xlarge
Extra Large DB Instance: 15 GB of memory, 8 ECUs (4 virtual
cores with 2 ECUs each), 64-bit platform, High I/O Capacity,
Provisioned IOPS Optimized: 1000Mbps
No PIOPS
$1.13 base/hour, license included, on-demand
7. Baseline Test Data
5,037,003 accounts
353,225,005 transactions
Roughly 70 initial transactions per account
300GB provisioned storage
Mostly to get higher PIOPS
Using ~67GB of it
According to CloudWatch
9. Processing View
Lightweight Batch Specs (2000b by 500tx)
{"targetReadRatio":3,"targetWriteRatio":1,"size":500,"run":"run01",
"id":13,"accountRange":{"start":10001,"count":5040800}
Producer
Tx Queue
Batch Performance Stats
(Also JSON formatted – tl;dr)
Consumers
(12-24)
Stats Queue
• 1M JDBC tx/run
• 3 read : 1 write ratio
• Randomized over the known
set of pre-loaded accounts
• Commit per tx (not per
batch)
RDS Instances
(Victims)
Stats
Collector
.csv
10. Transaction Specifications
Read Transaction
Query random ACCOUNT for balance
Query TX for last 10 tx by TIMESTAMP DESC
Scan the returned cursor
Write Transaction
Insert a random (+/-) amount into the TX table for a random
account
Update the ACCOUNT table by applying that amount to the
current balance
Commit (or rollback on failure)
16. [3] db.m2.4xlarge, 3000 PIOPS
(4 consumers @ 6 threads ea) … third run
Run 03 Monitoring Results
Peaked @ 2500+ Write IOPS
Very curious what’s going on
in this interval, from peak to
end of run
Disk Queue Depth
tracks Write IOPS (or vice versa)
17. [4] db.m2.4xlarge, 1000 PIOPS
(2 consumers @ 6 threads ea)
Cumulative: 2854 tps
Run 04
12000
5000
4500
10000
Dialed back concurrency, on the hunch that
Oracle is resetting too many connections
8000
6000
3500
3000
2500
2000
4000
1500
1000
2000
500
0
0
1
101
201
301
401
501
601
701
801
901
1001 1101 1201 1301 1401 1501 1601 1701 1801 1901
Batch Received by Stats Collector
ElapsedTimeMillis
TotalTxPerSecond
TPS
Milliseconds Elapsed per Batch
4000
25. Pricing
(does not include cost of backup storage)
Single AZ
Instance Type
Storage
PIOPS (GB)
Hourly
O/D**
PIOPS/
Month
Multi-AZ
Storage/
Cost/
GB-month* Month
Hourly
O/D**
PIOPS/
Month
Storage/
Cost/
GB-month* Month
Runs 1,2,3
db.m2.4xlarge
3000
300
$3.14
$0.10
$0.13 $2,598.30
$6.28
$0.20
$0.25 $5,196.60
Runs 4,5
db.m2.4xlarge
1000
300
$3.14
$0.10
$0.13 $2,398.30
$6.28
$0.20
$0.25 $4,796.60
Run 6
db.m1.xlarge
0
300
$1.13
$0.10
$0.10
$2.26
$0.20
$0.20 $1,687.20
$843.60
*Non-PIOPS storage also incurs I/O requests at $0.10/million requests
**Oracle “license-included” pricing. Significant savings for reserved instances.
26. Conclusions and Takeaways
PIOPS matters
For throughput and latency
Need larger sampling periods
To mitigate the effect of warm-up of instruments and subject
Need to try different R/W ratios
And to gauge how they impact realized PIOPS
Backup and restore takes time
Consider use of promotable read replicas, for platforms that support it
Otherwise I might have had more samples