SlideShare uma empresa Scribd logo
1 de 27
Adventures in RDS Load
Testing
Mike Harnish, KSM Technology Partners LLC
Objectives
Empirical basis for evaluation
 Of RDS as a platform for future development
 Of performance of different configurations

Platform for future load testing
 Of different configurations, schemas, and load profiles

Not strictly scientific
 Did not try to isolate all possible sources of variability

Not benchmarking
Not exhaustive
 Some configurations not tested
Why RDS? Why Oracle?
Why not DynamoDB/NoSQL?
 Nothing at all against them
 Testing platform design does not exclude them

Why not MySQL/SQLServer?
 Ran out of time

Why not PostgreSQL?
 Ran out of time, but would be my next choice

RDBMS migration path
How We Tested
Provision RDS servers
Generate test data
Introduce distributed load
 Persistent and relentless
 Rough-grained “batches” of work
 For a finite number of transactions

Monitor servers
 With Cloudwatch

Analyze per-batch statistics
RDS Server Configurations
db.m2.4xlarge
 High-Memory Quadruple Extra Large DB Instance: 68 GB of
memory, 26 ECUs (8 virtual cores with 3.25 ECUs each), 64-bit
platform, High I/O Capacity, Provisioned IOPS Optimized:
1000Mbps
 At 3000 and 1000 PIOPS
 $3.14 base/hour, Oracle license included
 The largest supported instance type for Oracle

db.m1.xlarge
 Extra Large DB Instance: 15 GB of memory, 8 ECUs (4 virtual
cores with 2 ECUs each), 64-bit platform, High I/O Capacity,
Provisioned IOPS Optimized: 1000Mbps
 No PIOPS
 $1.13 base/hour, license included, on-demand
Test Schema
CREATE TABLE loadgen.account(
account_id NUMBER(9)
CONSTRAINT pk_account PRIMARY KEY,
balance NUMBER(6,2) DEFAULT 0 NOT NULL);
CREATE TABLE loadgen.tx(
tx_id NUMBER(9) CONSTRAINT pk_tx PRIMARY KEY,
account_id NUMBER(9) CONSTRAINT fk_tx_account
REFERENCES loadgen.account(account_id),
amount NUMBER(6,2) NOT NULL,
description VARCHAR2(100),
tx_timestamp TIMESTAMP DEFAULT SYSDATE);
CREATE INDEX loadgen.idx_tx_lookup ON loadgen.tx(account_id, tx_timestamp)
…
CREATE SEQUENCE loadgen.seq_tx_id
…
Baseline Test Data
5,037,003 accounts
353,225,005 transactions
 Roughly 70 initial transactions per account

300GB provisioned storage
 Mostly to get higher PIOPS

Using ~67GB of it
 According to CloudWatch
Test Environment
c1.xlarge
•
•
•
•

t1.micro

SQLPlus

8 vCPU
20 ECU
7GB memory
High network performance

JDBC
RDS Instances
Processing View
Lightweight Batch Specs (2000b by 500tx)
{"targetReadRatio":3,"targetWriteRatio":1,"size":500,"run":"run01",
"id":13,"accountRange":{"start":10001,"count":5040800}

Producer

Tx Queue

Batch Performance Stats
(Also JSON formatted – tl;dr)

Consumers
(12-24)

Stats Queue

• 1M JDBC tx/run
• 3 read : 1 write ratio
• Randomized over the known
set of pre-loaded accounts
• Commit per tx (not per
batch)

RDS Instances
(Victims)

Stats
Collector
.csv
Transaction Specifications
Read Transaction
 Query random ACCOUNT for balance
 Query TX for last 10 tx by TIMESTAMP DESC
 Scan the returned cursor

Write Transaction
 Insert a random (+/-) amount into the TX table for a random
account
 Update the ACCOUNT table by applying that amount to the
current balance
 Commit (or rollback on failure)
[1] db.m2.4xlarge, 3000 PIOPS
(4 consumers @ 6 threads ea)
Cumulative: 5765 tps

Run 01
12000

9000

8000
10000

8000

6000

5000
TPS

Milliseconds Elapsed per Batch

7000

6000
4000

4000

3000

2000
2000
1000

0

0
1

101

201

301

401

501

601

701

801

901

1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

Batch Received by Stats Collector
ElapsedTimeMillis

NetTPS
[1] db.m2.4xlarge, 3000 PIOPS
(4 consumers @ 6 threads ea)
Run 01 Monitoring Results

Peaked @ 2200 Write IOPS

Disk Queue Depth > 100

What’s up with Read IOPS?
[2] db.m2.4xlarge, 3000 PIOPS
(4 consumers @ 6 threads ea) … again
???

Cumulative: 4804 tps

Run 02
30000

14000

12000

10000
20000

8000
TPS

Milliseconds Elapsed per Batch

25000

15000
6000
10000
4000

5000

2000

0

0
1

101

201

301

401

501

601

701

801

901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

Batch Received by Stats Collector
ElapsedTimeMillis

TotalTxPerSecond
[2] db.m2.4xlarge, 3000 PIOPS
(4 consumers @ 6 threads ea) … again
Run 02 Monitoring Results

Peaked @ 2500+ Write IOPS

Disk Queue Depth
tracks Write IOPS (or vice versa)
[3] db.m2.4xlarge, 3000 PIOPS
(4 consumers @ 6 threads ea) … third run
Cumulative: 4842 tps

Run 03
30000

10000
9000

25000

7000
20000
6000

15000

5000
4000

10000
3000
2000
5000
1000
0

0
1

101

201

301

401

501

601

701

801

901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

Batch Received by Stats Collector
ElapsedTimeMillis

TotalTxPerSecond

TPS

Milliseconds Elapsed per Batch

8000
[3] db.m2.4xlarge, 3000 PIOPS
(4 consumers @ 6 threads ea) … third run
Run 03 Monitoring Results

Peaked @ 2500+ Write IOPS
Very curious what’s going on
in this interval, from peak to
end of run
Disk Queue Depth
tracks Write IOPS (or vice versa)
[4] db.m2.4xlarge, 1000 PIOPS
(2 consumers @ 6 threads ea)
Cumulative: 2854 tps
Run 04
12000

5000
4500

10000

Dialed back concurrency, on the hunch that
Oracle is resetting too many connections

8000

6000

3500
3000
2500
2000

4000
1500
1000
2000
500
0

0
1

101

201

301

401

501

601

701

801

901

1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

Batch Received by Stats Collector
ElapsedTimeMillis

TotalTxPerSecond

TPS

Milliseconds Elapsed per Batch

4000
[4] db.m2.4xlarge, 1000 PIOPS
(2 consumers @ 6 threads ea)
Run 04 Monitoring Results
[5] db.m2.4xlarge, 1000 PIOPS
(4 consumers @ 6 threads ea)
Cumulative: 2187 tps

Run 05
80000

6000

Dialing back up made it worse

5000

60000
4000
50000

40000

3000

30000
2000
20000
1000
10000

0

0
1

101

201

301

401

501

601

701

801

901

1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

Batch Received by Stats Collector
ElapsedTimeMillis

TotalTxPerSecond

TPS

Milliseconds Elapsed per Batch

70000
[5] db.m2.4xlarge, 1000 PIOPS
(4 consumers @ 6 threads ea)
Run 05 Monitoring Results
[6] db.m1.xlarge, No PIOPS
(2 consumers @ 6 threads ea)
12000

Cumulative: 1061 tps

Run 06

Some early flutter, but
not much

1200

1000

8000

800

6000

600

4000

400

2000

200

0

0
1

101

201

301

401

501

601

701

801

901

1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

Batch Received by Stats Collector
ElapsedTimeMillis

TotalTxPerSecond

TPS

Milliseconds Elapsed per Batch

10000
[6] db.m1.xlarge, No PIOPS
(2 consumers @ 6 threads ea)
Run 06 Monitoring Results

Different colors than on
previous slides
Latency: Run 1 (3000 PIOPS)
Run 01 Batch Latencies (all milliseconds)

2000

15

1500

10

1000

5

500

0

0
1

101

201

301

401

501

601

701

801

901

1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

Batch Received by Stats Collector
MedianWriteLatency

AvgTxLatencyMs

HighWriteLatency

High Write Latency

2500

20
AverageTx/Median Write Latency

25
Latency: Run 6 (No PIOPS)
Run 06 Batch Latencies (all milliseconds)
45

3500

40
3000

2500
30
2000

25

20

1500

15
1000
10
500
5

0

0
1

101

201

301

401

501

601

701

801

901

1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

Batch Received by Stats Collector
AvgTxLatencyMs

MedianWriteLatency

HighWriteLatency

High Write Latency

AverageTx/Median Write Latency

35
Pricing

(does not include cost of backup storage)

Single AZ
Instance Type

Storage
PIOPS (GB)

Hourly
O/D**

PIOPS/
Month

Multi-AZ

Storage/
Cost/
GB-month* Month

Hourly
O/D**

PIOPS/
Month

Storage/
Cost/
GB-month* Month

Runs 1,2,3

db.m2.4xlarge

3000

300

$3.14

$0.10

$0.13 $2,598.30

$6.28

$0.20

$0.25 $5,196.60

Runs 4,5

db.m2.4xlarge

1000

300

$3.14

$0.10

$0.13 $2,398.30

$6.28

$0.20

$0.25 $4,796.60

Run 6

db.m1.xlarge

0

300

$1.13

$0.10

$0.10

$2.26

$0.20

$0.20 $1,687.20

$843.60

*Non-PIOPS storage also incurs I/O requests at $0.10/million requests
**Oracle “license-included” pricing. Significant savings for reserved instances.
Conclusions and Takeaways
PIOPS matters
 For throughput and latency

Need larger sampling periods
 To mitigate the effect of warm-up of instruments and subject

Need to try different R/W ratios
 And to gauge how they impact realized PIOPS

Backup and restore takes time
 Consider use of promotable read replicas, for platforms that support it
 Otherwise I might have had more samples
Questions?

Mais conteúdo relacionado

Mais procurados

Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
DataStax
 

Mais procurados (20)

Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
Deep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block StoreDeep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block Store
 
AWS - an introduction to bursting (GP2 - T2)
AWS - an introduction to bursting (GP2 - T2)AWS - an introduction to bursting (GP2 - T2)
AWS - an introduction to bursting (GP2 - T2)
 
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
 
SRV413 Deep Dive on Elastic Block Storage (Amazon EBS)
SRV413 Deep Dive on Elastic Block Storage (Amazon EBS)SRV413 Deep Dive on Elastic Block Storage (Amazon EBS)
SRV413 Deep Dive on Elastic Block Storage (Amazon EBS)
 
Amazon RDS for PostgreSQL - PGConf 2016
Amazon RDS for PostgreSQL - PGConf 2016 Amazon RDS for PostgreSQL - PGConf 2016
Amazon RDS for PostgreSQL - PGConf 2016
 
Cassandra Performance Benchmark
Cassandra Performance BenchmarkCassandra Performance Benchmark
Cassandra Performance Benchmark
 
Public Cloud Performance Measurement Report
Public Cloud Performance Measurement ReportPublic Cloud Performance Measurement Report
Public Cloud Performance Measurement Report
 
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech Talks
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech TalksDeep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech Talks
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech Talks
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
 
Devnexus slides - Amazon Web Services
Devnexus slides - Amazon Web ServicesDevnexus slides - Amazon Web Services
Devnexus slides - Amazon Web Services
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
 
Code for the earth OCP APAC Tokyo 2013-05
Code for the earth OCP APAC Tokyo 2013-05Code for the earth OCP APAC Tokyo 2013-05
Code for the earth OCP APAC Tokyo 2013-05
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log system
 
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
(DAT405) Amazon Aurora Deep Dive
(DAT405) Amazon Aurora Deep Dive(DAT405) Amazon Aurora Deep Dive
(DAT405) Amazon Aurora Deep Dive
 

Semelhante a Adventures in RDS Load Testing

Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
DataStax
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
DataStax Academy
 

Semelhante a Adventures in RDS Load Testing (20)

Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Ycsb benchmarking
Ycsb benchmarkingYcsb benchmarking
Ycsb benchmarking
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valk
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
PostgreSQL performance archaeology
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeology
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
 
Faceting optimizations for Solr
Faceting optimizations for SolrFaceting optimizations for Solr
Faceting optimizations for Solr
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...
Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...
Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
 
Final_Presentation_Docker_KP
Final_Presentation_Docker_KPFinal_Presentation_Docker_KP
Final_Presentation_Docker_KP
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Adventures in RDS Load Testing

  • 1. Adventures in RDS Load Testing Mike Harnish, KSM Technology Partners LLC
  • 2. Objectives Empirical basis for evaluation  Of RDS as a platform for future development  Of performance of different configurations Platform for future load testing  Of different configurations, schemas, and load profiles Not strictly scientific  Did not try to isolate all possible sources of variability Not benchmarking Not exhaustive  Some configurations not tested
  • 3. Why RDS? Why Oracle? Why not DynamoDB/NoSQL?  Nothing at all against them  Testing platform design does not exclude them Why not MySQL/SQLServer?  Ran out of time Why not PostgreSQL?  Ran out of time, but would be my next choice RDBMS migration path
  • 4. How We Tested Provision RDS servers Generate test data Introduce distributed load  Persistent and relentless  Rough-grained “batches” of work  For a finite number of transactions Monitor servers  With Cloudwatch Analyze per-batch statistics
  • 5. RDS Server Configurations db.m2.4xlarge  High-Memory Quadruple Extra Large DB Instance: 68 GB of memory, 26 ECUs (8 virtual cores with 3.25 ECUs each), 64-bit platform, High I/O Capacity, Provisioned IOPS Optimized: 1000Mbps  At 3000 and 1000 PIOPS  $3.14 base/hour, Oracle license included  The largest supported instance type for Oracle db.m1.xlarge  Extra Large DB Instance: 15 GB of memory, 8 ECUs (4 virtual cores with 2 ECUs each), 64-bit platform, High I/O Capacity, Provisioned IOPS Optimized: 1000Mbps  No PIOPS  $1.13 base/hour, license included, on-demand
  • 6. Test Schema CREATE TABLE loadgen.account( account_id NUMBER(9) CONSTRAINT pk_account PRIMARY KEY, balance NUMBER(6,2) DEFAULT 0 NOT NULL); CREATE TABLE loadgen.tx( tx_id NUMBER(9) CONSTRAINT pk_tx PRIMARY KEY, account_id NUMBER(9) CONSTRAINT fk_tx_account REFERENCES loadgen.account(account_id), amount NUMBER(6,2) NOT NULL, description VARCHAR2(100), tx_timestamp TIMESTAMP DEFAULT SYSDATE); CREATE INDEX loadgen.idx_tx_lookup ON loadgen.tx(account_id, tx_timestamp) … CREATE SEQUENCE loadgen.seq_tx_id …
  • 7. Baseline Test Data 5,037,003 accounts 353,225,005 transactions  Roughly 70 initial transactions per account 300GB provisioned storage  Mostly to get higher PIOPS Using ~67GB of it  According to CloudWatch
  • 8. Test Environment c1.xlarge • • • • t1.micro SQLPlus 8 vCPU 20 ECU 7GB memory High network performance JDBC RDS Instances
  • 9. Processing View Lightweight Batch Specs (2000b by 500tx) {"targetReadRatio":3,"targetWriteRatio":1,"size":500,"run":"run01", "id":13,"accountRange":{"start":10001,"count":5040800} Producer Tx Queue Batch Performance Stats (Also JSON formatted – tl;dr) Consumers (12-24) Stats Queue • 1M JDBC tx/run • 3 read : 1 write ratio • Randomized over the known set of pre-loaded accounts • Commit per tx (not per batch) RDS Instances (Victims) Stats Collector .csv
  • 10. Transaction Specifications Read Transaction  Query random ACCOUNT for balance  Query TX for last 10 tx by TIMESTAMP DESC  Scan the returned cursor Write Transaction  Insert a random (+/-) amount into the TX table for a random account  Update the ACCOUNT table by applying that amount to the current balance  Commit (or rollback on failure)
  • 11. [1] db.m2.4xlarge, 3000 PIOPS (4 consumers @ 6 threads ea) Cumulative: 5765 tps Run 01 12000 9000 8000 10000 8000 6000 5000 TPS Milliseconds Elapsed per Batch 7000 6000 4000 4000 3000 2000 2000 1000 0 0 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 Batch Received by Stats Collector ElapsedTimeMillis NetTPS
  • 12. [1] db.m2.4xlarge, 3000 PIOPS (4 consumers @ 6 threads ea) Run 01 Monitoring Results Peaked @ 2200 Write IOPS Disk Queue Depth > 100 What’s up with Read IOPS?
  • 13. [2] db.m2.4xlarge, 3000 PIOPS (4 consumers @ 6 threads ea) … again ??? Cumulative: 4804 tps Run 02 30000 14000 12000 10000 20000 8000 TPS Milliseconds Elapsed per Batch 25000 15000 6000 10000 4000 5000 2000 0 0 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 Batch Received by Stats Collector ElapsedTimeMillis TotalTxPerSecond
  • 14. [2] db.m2.4xlarge, 3000 PIOPS (4 consumers @ 6 threads ea) … again Run 02 Monitoring Results Peaked @ 2500+ Write IOPS Disk Queue Depth tracks Write IOPS (or vice versa)
  • 15. [3] db.m2.4xlarge, 3000 PIOPS (4 consumers @ 6 threads ea) … third run Cumulative: 4842 tps Run 03 30000 10000 9000 25000 7000 20000 6000 15000 5000 4000 10000 3000 2000 5000 1000 0 0 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 Batch Received by Stats Collector ElapsedTimeMillis TotalTxPerSecond TPS Milliseconds Elapsed per Batch 8000
  • 16. [3] db.m2.4xlarge, 3000 PIOPS (4 consumers @ 6 threads ea) … third run Run 03 Monitoring Results Peaked @ 2500+ Write IOPS Very curious what’s going on in this interval, from peak to end of run Disk Queue Depth tracks Write IOPS (or vice versa)
  • 17. [4] db.m2.4xlarge, 1000 PIOPS (2 consumers @ 6 threads ea) Cumulative: 2854 tps Run 04 12000 5000 4500 10000 Dialed back concurrency, on the hunch that Oracle is resetting too many connections 8000 6000 3500 3000 2500 2000 4000 1500 1000 2000 500 0 0 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 Batch Received by Stats Collector ElapsedTimeMillis TotalTxPerSecond TPS Milliseconds Elapsed per Batch 4000
  • 18. [4] db.m2.4xlarge, 1000 PIOPS (2 consumers @ 6 threads ea) Run 04 Monitoring Results
  • 19. [5] db.m2.4xlarge, 1000 PIOPS (4 consumers @ 6 threads ea) Cumulative: 2187 tps Run 05 80000 6000 Dialing back up made it worse 5000 60000 4000 50000 40000 3000 30000 2000 20000 1000 10000 0 0 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 Batch Received by Stats Collector ElapsedTimeMillis TotalTxPerSecond TPS Milliseconds Elapsed per Batch 70000
  • 20. [5] db.m2.4xlarge, 1000 PIOPS (4 consumers @ 6 threads ea) Run 05 Monitoring Results
  • 21. [6] db.m1.xlarge, No PIOPS (2 consumers @ 6 threads ea) 12000 Cumulative: 1061 tps Run 06 Some early flutter, but not much 1200 1000 8000 800 6000 600 4000 400 2000 200 0 0 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 Batch Received by Stats Collector ElapsedTimeMillis TotalTxPerSecond TPS Milliseconds Elapsed per Batch 10000
  • 22. [6] db.m1.xlarge, No PIOPS (2 consumers @ 6 threads ea) Run 06 Monitoring Results Different colors than on previous slides
  • 23. Latency: Run 1 (3000 PIOPS) Run 01 Batch Latencies (all milliseconds) 2000 15 1500 10 1000 5 500 0 0 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 Batch Received by Stats Collector MedianWriteLatency AvgTxLatencyMs HighWriteLatency High Write Latency 2500 20 AverageTx/Median Write Latency 25
  • 24. Latency: Run 6 (No PIOPS) Run 06 Batch Latencies (all milliseconds) 45 3500 40 3000 2500 30 2000 25 20 1500 15 1000 10 500 5 0 0 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 Batch Received by Stats Collector AvgTxLatencyMs MedianWriteLatency HighWriteLatency High Write Latency AverageTx/Median Write Latency 35
  • 25. Pricing (does not include cost of backup storage) Single AZ Instance Type Storage PIOPS (GB) Hourly O/D** PIOPS/ Month Multi-AZ Storage/ Cost/ GB-month* Month Hourly O/D** PIOPS/ Month Storage/ Cost/ GB-month* Month Runs 1,2,3 db.m2.4xlarge 3000 300 $3.14 $0.10 $0.13 $2,598.30 $6.28 $0.20 $0.25 $5,196.60 Runs 4,5 db.m2.4xlarge 1000 300 $3.14 $0.10 $0.13 $2,398.30 $6.28 $0.20 $0.25 $4,796.60 Run 6 db.m1.xlarge 0 300 $1.13 $0.10 $0.10 $2.26 $0.20 $0.20 $1,687.20 $843.60 *Non-PIOPS storage also incurs I/O requests at $0.10/million requests **Oracle “license-included” pricing. Significant savings for reserved instances.
  • 26. Conclusions and Takeaways PIOPS matters  For throughput and latency Need larger sampling periods  To mitigate the effect of warm-up of instruments and subject Need to try different R/W ratios  And to gauge how they impact realized PIOPS Backup and restore takes time  Consider use of promotable read replicas, for platforms that support it  Otherwise I might have had more samples

Notas do Editor

  1. All single-AZ, all in us-east-1d because I’m a glutton for punishment