SlideShare uma empresa Scribd logo
1 de 22
Amazon Redshift SSD
- Queries on TBs of data can
run in a few seconds
FlyData: Amazon Redshift
BENCHMARK Series 03
www.flydata.com
Amazon Redshift HDD took 33.32 seconds to run our
queries for 300GB data
Amazon Redshift SSD took 4.32 seconds to run our
queries for 300GB data
Amazon Redshift SSD performed 8X faster
Takeaways:
•1.2 TB can now be handled in under
10 seconds.
•Use cases could spread to ad-delivery
optimization and financial trading
systems.
www.flydata.com
Amazon Redshift is a popular data warehouse for
big data on the cloud. AWS added the SSD instance
type on January 24, 2014.
We have run benchmarks to compare Redshift SSD
instances to Redshift HDD instances using the
following parameters:
• Data Size: 1.2TB and 300GB
• Query performance when
querying against all records in the cluster
• Loading speed
• Cost comparison
www.flydata.com
1. Query Speed for similar cluster sizes
• SSD version is
faster.
• Query against
1.2TB (entire data
set) took less than
10 seconds!
• For 1.2TB of data,
comparing similar
node sizes:
query time: 9.22s
(SSD) vs 28.48s
(HDD 8XLx2)
* See Appendix for queries being used.
Comparison of query speed against dw1.xlarge (HDD) and dw2.large (SSD) for 1.2TBs of data.
In order of cost
www.flydata.com
1. Query Speed at similar pricing points
• Query performance comparison based
on similar pricing point.
• 4 nodes of dw2.large cost:
$0.25(/hour) * 4(nodes) = $1.00(/hour)
• 1 node of dw1.xlarge cost:
$0.85(/hour)
• Direct comparison is difficult, but we
can see much better query
performance for the dw2 (SSD)
Redshift.
* See Appendix for queries being used.
Comparison of query speed for cluster configurations with similar pricing for 300GB of data.
www.flydata.com
2. Loading Time
• For similar cost
(DW2:$1.00/hour vs
DW1:$0.85/hour),
loading time was 4.6x
faster on SSD.
• For similar node sizes
(DW2:12 nodes vs
DW1:16 nodes),
loading time was
1.65x faster on SSD.
* See Appendix for queries being used.
Similar Cost Similar Node
Count
www.flydata.com
7
DW2 Cheaper when
data < 0.48TB
TB
3. Cost
Pricing Ondemand RI1 RI3
Hourly Upfront Hourly Upfront Hourly
dw1 $0.85 $2500 $0.215 $3000 $0.114
dw2 $0.25 $750 $0.075 $1325 $0.05
www.flydata.com
Summary
• Consider DW2 SSD Redshift
– If Query and Loading Performance is primary
and cost considerations are secondary
– If your data is smaller than 0.48TBs
• Consider DW1 HDD Redshift
– If current DW1 Redshift performance is
sufficient
– If DW2 costs are too expensive for your use
case
www.flydata.com
About Us - FlyData
• FlyData Enterprise
– Enables continuous loading to Amazon Redshift,
with real-time data loading
– Automated ETL process with multiple supported
data formats
– Auto scaling, data Integrity and high durability
– FlyData Sync feature allows real-time replication
from RDBMS to Amazon Redshift
Contact us at: info@flydata.com
We are an official data
integration partner of
Amazon Redshift
www.flydata.com
APPENDIX
www.flydata.com
Appendix: Data Loaded for Testing
TSV files, gzip compressed
Imp_lo
g
1) 300GB / 300M
record
2) 1.2TB / 1.2B record date datetime
publisher_id integer
ad_campaign_id integer
bid_price real
country varchar(30)
attr1-4 varchar(255)
click_l
og
1) 1.4GB / 1.5M
record
2) 5.6GB / 6M recorddate datetime
publisher_id integer
ad_campaign_id integer
country varchar(30)
attr1-4 varchar(255)
1) for 1 month
2) for 4
months
ad_campai
gn
100MB / 100k
record
publish
er
10MB / 10k
record
advertis
er
10MB / 10k
record
We used 5 tables to run a query which joins tables and creates a report.
www.flydata.com
Appendix: Sample Query
select
ac.ad_campaign_id as ad_campaign_id,
adv.advertiser_id as advertiser_id,
cs.spending as spending,
ims.imp_total as imp_total,
cs.click_total as click_total,
click_total/imp_total as CTR,
spending/click_total as CPC,
spending/(imp_total/1000) as CPM
from
ad_campaigns ac
join
advertisers adv
on (ac.advertiser_id = adv.advertiser_id)
join
(select
il.ad_campaign_id,
count(*) as imp_total
from
imp_logs il
group by
il.ad_campaign_id
) ims on (ims.ad_campaign_id =
ac.ad_campaign_id)
join
(select
cl.ad_campaign_id,
sum(cl.bid_price) as spending,
count(*) as click_total
from
click_logs cl
group by
cl.ad_campaign_id
) cs on (cs.ad_campaign_id = ac.ad_campaign_id);
The query generates a basic report for ad campaigns performance, imp, click numbers,
advertiser spending, CTR, CPC and CPM. The query runs against all data in the
cluster.
www.flydata.com
Query Performance: Data Size = 1.2 TB
Query Process
time(1.2TB) 12x DW2.large 1x DW1.xlarge
2x
DW1.xlarge
2x
DW1.8xlarge
trial Sample Query Sample Query
Sample
Query
Sample
Query
1 15.3 163.85 61.44 39.11ignore
2 8.8 148.65 52.89 26.77
3 9.71 157.65 53.76 29.9
4 9.12 155.91 53.52 27.51
5 9.24 149.04 52.22 29.75
average 9.2175 155.02 53.0975 28.4825
(In seconds)
www.flydata.com
Query Performance: Data Size = 300GB
Query Process
time(300GB) 4x DW2.large 1x DW1.xlarge
trial Sample Query Sample Query
1 9.05 58ignore
2 4.31 42.69
3 4.65 30.84
4 4.13 30.14
5 4.17 29.6
average 4.315 33.3175
(In seconds)
www.flydata.com
Appendix: Additional Information
• All resources for our benchmark are on
our github repository
– https://github.com/hapyrus/redshift-
https://github.com/hapyrus/redshift-
benchmark
– The dataset we use is open on S3, so you
can reproduce the benchmark
www.flydata.com
Summary: Amazon Redshift Pricing
• DW1: Amazon Redshift (HHD)
• DW2: Amazon Redshift (SSD)
– Cost is around 4x more expensive
– If storage need is less than 0.48TB, then DW2
is cheaper
16
www.flydata.com
Cost comparison:
1XL of DW1 (2TB),
4XL of DW2 (0.64TB) and 12XL of DW2 (1.92TB)
17
www.flydata.com
18
x
x
For the same storage space,
DW2 SSD can be 5.2 times higher
www.flydata.com
19
www.flydata.com
20
www.flydata.com
Additional Comments
• SSD could be 3.5x ~ 5x more expensive than
HDD for the same amount of storage space
(SSD is really optimized for performance)
• DW1.8xlarge is exactly 8 times a DW1.xlarge,
but DW2.8xlarge is actually 16 times a
DW2.large. This is because DW2.large nodes
are not “xlarge”; a bit confusing… ;)
(as of Jan. 27, 2014)
www.flydata.com
www.flydata.com www.flydata.com
Check us out!
-> http://flydata.com
sales@flydata.com
Toll Free: 1-855-427-9787
http://flydata.com
We are an official data integration
partner of Amazon Redshift

Mais conteúdo relacionado

Destaque

Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Amazon Web Services
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...Amazon Web Services
 
Psycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python ScriptPsycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python ScriptSurvey Department
 
The Biggest Internet of Things Milestones
The Biggest Internet of Things Milestones The Biggest Internet of Things Milestones
The Biggest Internet of Things Milestones NeuraInc
 
Business Intelligence on Hadoop Benchmark
Business Intelligence on Hadoop BenchmarkBusiness Intelligence on Hadoop Benchmark
Business Intelligence on Hadoop Benchmarkatscaleinc
 
AWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonAWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonRoberto Gaiser
 
AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19Amazon Web Services
 
Supporting Debian machines for friends and family
Supporting Debian machines for friends and familySupporting Debian machines for friends and family
Supporting Debian machines for friends and familyFrancois Marier
 
Swift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StorySwift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StoryBrian Cline
 
Disksim with SSD_extension
Disksim with SSD_extensionDisksim with SSD_extension
Disksim with SSD_extensioncucufrog
 
How to build Debian packages
How to build Debian packages How to build Debian packages
How to build Debian packages Priyank Kapadia
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAmazon Web Services
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaJason Shih
 
Dockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec KraloveDockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec Kralovedamovsky
 
My sql ssd-mysqluc-2012
My sql ssd-mysqluc-2012My sql ssd-mysqluc-2012
My sql ssd-mysqluc-2012james tong
 
Debian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIsDebian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIsJames Bromberger
 
Myths and Legends- "The Pantheon"
Myths and Legends- "The Pantheon"Myths and Legends- "The Pantheon"
Myths and Legends- "The Pantheon"kdmitchell
 
SSD vs HDD - A Shift In Data Storage by Todd Dinkelman
SSD vs HDD - A Shift In Data Storage by Todd DinkelmanSSD vs HDD - A Shift In Data Storage by Todd Dinkelman
SSD vs HDD - A Shift In Data Storage by Todd Dinkelmannomathjobs
 

Destaque (20)

Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
 
Psycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python ScriptPsycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python Script
 
The Biggest Internet of Things Milestones
The Biggest Internet of Things Milestones The Biggest Internet of Things Milestones
The Biggest Internet of Things Milestones
 
Business Intelligence on Hadoop Benchmark
Business Intelligence on Hadoop BenchmarkBusiness Intelligence on Hadoop Benchmark
Business Intelligence on Hadoop Benchmark
 
AWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonAWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparison
 
AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19
 
Supporting Debian machines for friends and family
Supporting Debian machines for friends and familySupporting Debian machines for friends and family
Supporting Debian machines for friends and family
 
Swift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StorySwift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer Story
 
Disksim with SSD_extension
Disksim with SSD_extensionDisksim with SSD_extension
Disksim with SSD_extension
 
How to build Debian packages
How to build Debian packages How to build Debian packages
How to build Debian packages
 
MySQL and SSD
MySQL and SSDMySQL and SSD
MySQL and SSD
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing Performance
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
Dockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec KraloveDockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec Kralove
 
My sql ssd-mysqluc-2012
My sql ssd-mysqluc-2012My sql ssd-mysqluc-2012
My sql ssd-mysqluc-2012
 
Debian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIsDebian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIs
 
Myths and Legends- "The Pantheon"
Myths and Legends- "The Pantheon"Myths and Legends- "The Pantheon"
Myths and Legends- "The Pantheon"
 
SSD vs HDD - A Shift In Data Storage by Todd Dinkelman
SSD vs HDD - A Shift In Data Storage by Todd DinkelmanSSD vs HDD - A Shift In Data Storage by Todd Dinkelman
SSD vs HDD - A Shift In Data Storage by Todd Dinkelman
 

Mais de FlyData Inc.

What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?FlyData Inc.
 
Three Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureThree Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureFlyData Inc.
 
Cognitive Biases in Data Science
Cognitive Biases in Data ScienceCognitive Biases in Data Science
Cognitive Biases in Data ScienceFlyData Inc.
 
How to Extract Data from Amazon Redshift
How to Extract Data from Amazon RedshiftHow to Extract Data from Amazon Redshift
How to Extract Data from Amazon RedshiftFlyData Inc.
 
Amazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift ClusterAmazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift ClusterFlyData Inc.
 
The Internet of Things
The Internet of ThingsThe Internet of Things
The Internet of ThingsFlyData Inc.
 
Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!FlyData Inc.
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData FlyData Inc.
 
FlyData Autoload: 事例集
FlyData Autoload: 事例集FlyData Autoload: 事例集
FlyData Autoload: 事例集FlyData Inc.
 
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
Amazon Redshift ベンチマーク  Hadoop + Hiveと比較 Amazon Redshift ベンチマーク  Hadoop + Hiveと比較
Amazon Redshift ベンチマーク Hadoop + Hiveと比較 FlyData Inc.
 

Mais de FlyData Inc. (11)

What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?
 
Three Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureThree Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data Infrastructure
 
Cognitive Biases in Data Science
Cognitive Biases in Data ScienceCognitive Biases in Data Science
Cognitive Biases in Data Science
 
How to Extract Data from Amazon Redshift
How to Extract Data from Amazon RedshiftHow to Extract Data from Amazon Redshift
How to Extract Data from Amazon Redshift
 
Amazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift ClusterAmazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift Cluster
 
The Internet of Things
The Internet of ThingsThe Internet of Things
The Internet of Things
 
Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData
 
FlyData Autoload: 事例集
FlyData Autoload: 事例集FlyData Autoload: 事例集
FlyData Autoload: 事例集
 
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
Amazon Redshift ベンチマーク  Hadoop + Hiveと比較 Amazon Redshift ベンチマーク  Hadoop + Hiveと比較
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
 

Último

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Último (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Amazon Redshift SSD - Queries on TBs of data can run in a few seconds

  • 1. Amazon Redshift SSD - Queries on TBs of data can run in a few seconds FlyData: Amazon Redshift BENCHMARK Series 03 www.flydata.com
  • 2. Amazon Redshift HDD took 33.32 seconds to run our queries for 300GB data Amazon Redshift SSD took 4.32 seconds to run our queries for 300GB data Amazon Redshift SSD performed 8X faster Takeaways: •1.2 TB can now be handled in under 10 seconds. •Use cases could spread to ad-delivery optimization and financial trading systems. www.flydata.com
  • 3. Amazon Redshift is a popular data warehouse for big data on the cloud. AWS added the SSD instance type on January 24, 2014. We have run benchmarks to compare Redshift SSD instances to Redshift HDD instances using the following parameters: • Data Size: 1.2TB and 300GB • Query performance when querying against all records in the cluster • Loading speed • Cost comparison www.flydata.com
  • 4. 1. Query Speed for similar cluster sizes • SSD version is faster. • Query against 1.2TB (entire data set) took less than 10 seconds! • For 1.2TB of data, comparing similar node sizes: query time: 9.22s (SSD) vs 28.48s (HDD 8XLx2) * See Appendix for queries being used. Comparison of query speed against dw1.xlarge (HDD) and dw2.large (SSD) for 1.2TBs of data. In order of cost www.flydata.com
  • 5. 1. Query Speed at similar pricing points • Query performance comparison based on similar pricing point. • 4 nodes of dw2.large cost: $0.25(/hour) * 4(nodes) = $1.00(/hour) • 1 node of dw1.xlarge cost: $0.85(/hour) • Direct comparison is difficult, but we can see much better query performance for the dw2 (SSD) Redshift. * See Appendix for queries being used. Comparison of query speed for cluster configurations with similar pricing for 300GB of data. www.flydata.com
  • 6. 2. Loading Time • For similar cost (DW2:$1.00/hour vs DW1:$0.85/hour), loading time was 4.6x faster on SSD. • For similar node sizes (DW2:12 nodes vs DW1:16 nodes), loading time was 1.65x faster on SSD. * See Appendix for queries being used. Similar Cost Similar Node Count www.flydata.com
  • 7. 7 DW2 Cheaper when data < 0.48TB TB 3. Cost Pricing Ondemand RI1 RI3 Hourly Upfront Hourly Upfront Hourly dw1 $0.85 $2500 $0.215 $3000 $0.114 dw2 $0.25 $750 $0.075 $1325 $0.05 www.flydata.com
  • 8. Summary • Consider DW2 SSD Redshift – If Query and Loading Performance is primary and cost considerations are secondary – If your data is smaller than 0.48TBs • Consider DW1 HDD Redshift – If current DW1 Redshift performance is sufficient – If DW2 costs are too expensive for your use case www.flydata.com
  • 9. About Us - FlyData • FlyData Enterprise – Enables continuous loading to Amazon Redshift, with real-time data loading – Automated ETL process with multiple supported data formats – Auto scaling, data Integrity and high durability – FlyData Sync feature allows real-time replication from RDBMS to Amazon Redshift Contact us at: info@flydata.com We are an official data integration partner of Amazon Redshift www.flydata.com
  • 11. Appendix: Data Loaded for Testing TSV files, gzip compressed Imp_lo g 1) 300GB / 300M record 2) 1.2TB / 1.2B record date datetime publisher_id integer ad_campaign_id integer bid_price real country varchar(30) attr1-4 varchar(255) click_l og 1) 1.4GB / 1.5M record 2) 5.6GB / 6M recorddate datetime publisher_id integer ad_campaign_id integer country varchar(30) attr1-4 varchar(255) 1) for 1 month 2) for 4 months ad_campai gn 100MB / 100k record publish er 10MB / 10k record advertis er 10MB / 10k record We used 5 tables to run a query which joins tables and creates a report. www.flydata.com
  • 12. Appendix: Sample Query select ac.ad_campaign_id as ad_campaign_id, adv.advertiser_id as advertiser_id, cs.spending as spending, ims.imp_total as imp_total, cs.click_total as click_total, click_total/imp_total as CTR, spending/click_total as CPC, spending/(imp_total/1000) as CPM from ad_campaigns ac join advertisers adv on (ac.advertiser_id = adv.advertiser_id) join (select il.ad_campaign_id, count(*) as imp_total from imp_logs il group by il.ad_campaign_id ) ims on (ims.ad_campaign_id = ac.ad_campaign_id) join (select cl.ad_campaign_id, sum(cl.bid_price) as spending, count(*) as click_total from click_logs cl group by cl.ad_campaign_id ) cs on (cs.ad_campaign_id = ac.ad_campaign_id); The query generates a basic report for ad campaigns performance, imp, click numbers, advertiser spending, CTR, CPC and CPM. The query runs against all data in the cluster. www.flydata.com
  • 13. Query Performance: Data Size = 1.2 TB Query Process time(1.2TB) 12x DW2.large 1x DW1.xlarge 2x DW1.xlarge 2x DW1.8xlarge trial Sample Query Sample Query Sample Query Sample Query 1 15.3 163.85 61.44 39.11ignore 2 8.8 148.65 52.89 26.77 3 9.71 157.65 53.76 29.9 4 9.12 155.91 53.52 27.51 5 9.24 149.04 52.22 29.75 average 9.2175 155.02 53.0975 28.4825 (In seconds) www.flydata.com
  • 14. Query Performance: Data Size = 300GB Query Process time(300GB) 4x DW2.large 1x DW1.xlarge trial Sample Query Sample Query 1 9.05 58ignore 2 4.31 42.69 3 4.65 30.84 4 4.13 30.14 5 4.17 29.6 average 4.315 33.3175 (In seconds) www.flydata.com
  • 15. Appendix: Additional Information • All resources for our benchmark are on our github repository – https://github.com/hapyrus/redshift- https://github.com/hapyrus/redshift- benchmark – The dataset we use is open on S3, so you can reproduce the benchmark www.flydata.com
  • 16. Summary: Amazon Redshift Pricing • DW1: Amazon Redshift (HHD) • DW2: Amazon Redshift (SSD) – Cost is around 4x more expensive – If storage need is less than 0.48TB, then DW2 is cheaper 16 www.flydata.com
  • 17. Cost comparison: 1XL of DW1 (2TB), 4XL of DW2 (0.64TB) and 12XL of DW2 (1.92TB) 17 www.flydata.com
  • 18. 18 x x For the same storage space, DW2 SSD can be 5.2 times higher www.flydata.com
  • 21. Additional Comments • SSD could be 3.5x ~ 5x more expensive than HDD for the same amount of storage space (SSD is really optimized for performance) • DW1.8xlarge is exactly 8 times a DW1.xlarge, but DW2.8xlarge is actually 16 times a DW2.large. This is because DW2.large nodes are not “xlarge”; a bit confusing… ;) (as of Jan. 27, 2014) www.flydata.com
  • 22. www.flydata.com www.flydata.com Check us out! -> http://flydata.com sales@flydata.com Toll Free: 1-855-427-9787 http://flydata.com We are an official data integration partner of Amazon Redshift