SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:Invent
Migrating Your Traditional Data
Warehouse to a Modern Data Lake
Vidhya Srinivasan, General Manager, Amazon Redshift
Balaji Muthuramalingam, Executive Director, Data & Analytics at 21st Century Fox
November 28, 2017
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon’s Analytics Architecture
Collect Store Analyze
Amazon Kinesis
Firehose
AWS Direct
Connect
Amazon
Snowball
Amazon Kinesis
Analytics
Amazon Kinesis
Streams
Amazon S3 Amazon Glacier
Amazon
CloudSearch
Amazon RDS,
Amazon Aurora
Amazon
DynamoDB
Amazon ES
Amazon EMR
Amazon
Redshift
Amazon
QuickSight
AWS Database Migration Service AWS Glue
Amazon Athena
Amazon AI
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical
representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor,
product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
“Amazon Redshift has the largest adoption
of BDW in the cloud.”
“With more than 5,000 deployments, Amazon
Redshift has the largest data warehouse
deployments in the cloud – some over 10
petabytes in size.”
AWS received a score of 5/5 (the highest
score possible) in the: customer base,
market awareness, ability to execute, road
map, support, and partners criteria
Forrester Wave Big Data Warehouse Q2 2017
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift – Data Warehousing
Fast, powerful, simple, and fully managed data warehouse at 1/10 the cost
Massively parallel, scales from gigabytes to exabytes
Fast at scale
Columnar storage technology
to improve I/O efficiency and
scale query performance
$
Inexpensive
As low as $1,000 per
terabyte per year, 1/10th
the cost of traditional data
warehouse solutions; start
at $0.25 per hour
Open file formats Secure
Audit everything; encrypt
data end-to-end; extensive
certification and compliance
Analyze optimized data
formats on direct-attached
disks, and all open data
formats in Amazon S3
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redshift Spectrum
E x t e n d t h e d a t a w a r e h o u s e t o y o u r S 3 d a t a l a k e
S3 data lakeAmazon
Redshift data
Redshift Spectrum
query engine
Exabyte Amazon Redshift SQL queries against S3
Join data across Amazon Redshift and S3
Scale compute and storage separately
Stable query performance and unlimited concurrency
Parquet, ORC, Grok, Avro, & CSV data formats
Pay only for the amount of data scanned
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redshift Spectrum
Q u e r y y o u r d a t a l a ke
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
AWS Glue
Data Catalog
Redshift Spectrum
Scale-out serverless compute
Query
SELECT COUNT(*)
FROM S3.EXT_TABLE
GROUP BY …
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Spectrum: Exabyte query in less than three minutes
SELECT
P.ASIN,
P.TITLE,
R.POSTAL_CODE,
P.RELEASE_DATE,
SUM(D.QUANTITY * D.OUR_PRICE) AS SALES_sum
FROM
s3.d_customer_order_item_details D,
asin_attributes A,
products P,
regions R
WHERE
D.ASIN = P.ASIN AND
P.ASIN = A.ASIN AND
D.REGION_ID = R.REGION_ID AND
A.EDITION LIKE '%FIRST%' AND
P.TITLE LIKE '%Potter%' AND
P.AUTHOR = 'J. K. Rowling' AND
R.COUNTRY_CODE = ‘US’ AND
R.CITY = ‘Seattle’ AND
R.STATE = ‘WA’ AND
D.ORDER_DAY :: DATE >= P.RELEASE_DATE AND
D.ORDER_DAY :: DATE < dateadd(day, 3, P.RELEASE_DATE)
GROUP BY P.ASIN, P.TITLE, R.POSTAL_CODE, P.RELEASE_DATE
ORDER BY SALES_sum DESC
LIMIT 20;
• Roughly 140 TB of customer item order detail
records for each day over past 20 years
• 190 million files across 15,000 partitions in S3
• One partition per day for USA and rest of world
• Total data size is over an exabyte
Optimization:
• Compression ……………..….……..5X
• Columnar file format……….......…10X
• Scanning with 2500 nodes…....2500X
• Static partition elimination…............2X
• Dynamic partition elimination..….350X
• Amazon Redshift query optimizer..40X
Hive (1000 nodes) Redshift Spectrum
5 years 155 seconds
* Estimated using 20 node Hive cluster & 1.4TB, assume linear
* Query used a 20 node DC1.8XLarge Amazon Redshift cluster
* Not actual sales data - generated for this demo based on data format
used by Amazon Retail.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
NUVIAD - Data Lake Analytics with Redshift Spectrum
Seamlessly analyzing open file formats directly in Amazon
S3 to provide fresh, up-to-the-minute insights
Unlimited analytics and query concurrency with Amazon
Redshift, and unlimited data capacity with Amazon S3
Scaling compute separately from storage in Amazon S3
for flexibility, fast performance and cost-effectiveness
“Spectrum is a game changer for us. Reports that took minutes to produce are now
delivered in seconds and we like the ability scale compute on-demand to query petabytes
of data in S3 in various open file formats.”– Rafi Ton, CEO, NUVIAD
NUVIAD is a mobile marketing platform providing professional marketers,
agencies and local businesses with hyper-targeted analytics at petabyte scale
AWS
Glue
Amazon
S3
Data
sources
Amazon
Redshift
Redshift
Spectrum
BI Tools
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift is widely available
Ireland
Frankfurt
London
Beijing
Mumbai
Seoul
Singapore
Sydney
Tokyo
Sao Paulo
US East – N Virginia
US East – Ohio
US West – Oregon
US West – N California
GovCloud
Canada – Central, Montreal
Currently Available
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Selected Amazon Redshift Customers
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Selected Amazon Redshift Partners
Data Integration Systems IntegratorsBusiness Intelligence
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recent and upcoming launches
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
New Dense Compute Node - DC2
2X Performance @ Same Price as DC1
3x more I/O with 30% better storage utilization than DC1
“We saw a 9x reduction in month-end reporting time
with Amazon Redshift dc2 nodes as compared to dc1”
- Bradley Todd,
Technical Architect, Liberty Mutual
NVMe SSD DDR4 Memory Intel E5-2686 v4 (Broadwell)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BI / Dashboard tools
Analytics and
Amazon
Redshift
Queries go to
leader node
1
If cache contains
query result, it’s
returned with no
processing
2
If query is not in
cache, it’s
executed and
result is cached
3
• In-memory leader node cache,
resulting in sub-second response
• Transparent – it just works
• Skip WLM, Skip processing, Skip
optimization
• Cache persists across sessions
• Caching frees up the Amazon
Redshift cluster, increasing
performance for other non-
repetitive queries
RESULTS CACHE
QUERY_ID RESULT
QUERY_ID RESULT
Result Caching - Sub-second query response times
Result Caching: From the lab
• Higher is better! (Queries per hour)
• Read-write workload with a mix of
small and large queries, Inserts,
Copy and Vacuum
• 4-node ds2.8xL cluster
Dashboard Heavy Reporting
138
8
2979
117
QUERY THROUGHPUT (QPH) WITH
RESULT CACHING
No Caching Caching
Result Caching: A customer perspective
• Lower is better! (Query Latency)
• 4-node dc2.8xL cluster
• Tableau dashboard; 10-user test
Caching
No Caching “That’s not a mistake...the results for average
execution time on the caching test run were
sub-second and so don't show up on the y-axis
at this scale”
Various dashboard queries
(names removed for confidentiality)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
S h or t Q u e ry A c c e lerati on – E x p re s s l a ne f o r S h or t q u e rie s
BI / Dashboard tools
Analytics and
Amazon
Redshift
• Short queries do not get stuck
behind long running queries
• Higher throughput, less variability
• Customized for your workload
• Transparent – it just works!
Machine Learning
Classifier
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Short Query Acceleration: Results
No SQA; 5 concurrency
SQA; 5 concurrency
“This configuration showed a distinct
improvement in short query runtimes
with the SQA feature enabled. Many
of the shortest queries saw a 5x or
greater improvement while the longer
running queries saw a corresponding
increase. This is exactly how we
expect the feature to work.”
 Average wait time reduces from 36
seconds to 0 for queries that execute
under a second
 P90 wait time on a very busy cluster
reduces from 370 seconds to 32.1
seconds
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Coming soon: Nested data support
• Analyze nested and semi-structured data in Amazon S3 with Spectrum
• Allows easy ETL of nested data in to Amazon Redshift using CTAS
• Support for open file formats: Parquet, ORC, JSON, Ion and AVRO
• Uses dot notation to extend your existing SQL
s3data.clickStream: <<
{ “session_time”: “20171013 14:05:00”,
“clicks”: [ {“page”: “/home”, “referrer”: “”},
{“page”: “/products”, “referrer”: “/home”} ]
},
{ “session_time”: “20171013 14:06:00”,
“clicks”: [ {“page”: “/contact”, “referrer”: “/home”} ]
} >>
SELECT c.page,
COUNT(*) AS count
FROM s3data.clickStream s,
s.clicks c
WHERE s.session_time > ‘2017-10-01 00:00:00’
AND c.referrer = “/home”
GROUP BY c.page;
Example: Find click frequency for links on “/home”:
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Coming soon: Nested data support
Improve query performance by analyzing nested data
OrderID CustomerID OrderTime ShipMode
5 23 10.00 12.50
8 32 1.00 5.60
OrdersWithItems
ItemID Quantity Price
23 10.00 12.50
16 1.00 1.99
32 1.00 5.60
24 5.00 26.50
OrderItems
OrderID ItemID Quantity Price
5 23 10.00 12.50
8 32 1.00 5.60
5 16 1.00 1.99
8 24 5.00 26.50
OrderID CustomerID OrderTime ShipMode
5 23 10.00 12.50
8 32 1.00 5.60
Orders
OrderItems
To improve query
performance, the
new Orders table
includes the
OrdersWithItems as
a nested column,
eliminating join
processing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Coming Soon: Enhanced Monitoring
Optimize your Amazon Redshift cluster for
peak performance by using query throughput
metrics
Get greater insights into your cluster
performance by accessing database and
workload metrics
Get alerts and notifications via Amazon SNS
Monitor query latency and throughput to optimize your workload
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Closing thoughts
• Increase performance by 2x, at the same price, by switching to DC2
• Redshift Spectrum extends your Amazon Redshift cluster to all of your data
in S3, seamlessly, efficiently and cost-effectively
• Query Monitoring rules, along with Short Query Acceleration and Result
Caching, can significantly improve performance
Please continue to provide us feedback at redshift-pm@amazon.com
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fox Film Entertainment
21st Century Fox Data Lake on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fox Film Entertainment (21CF)
CUSTOMER
PLATFORMS
BROADCASTERS
PLATFORMS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Landscape
+100 TB
Data
+150 Billion
Rows
+100
Sources
+25,000
User queries per day
+35,000
Data process per day
24x7 All Regions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Faster time to
market
Variety & Volume
High Availability
Stability
Challenges
Automation
Technology & Beyond
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Key Principles
Data Democratization
Cloud First
Faster Time to Market
Scale to Grow
Total Cost of Ownership
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fox–AWS Architecture
Collect Store Analyze
Data Transfer
Scheduled
Ingest
Data Lake
(Object Storage)
EDW/DM
(SQL MPP)
ETL
E(L)T
Spark
Visualize &
Analysis
Catalog, Management, Security
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Services
Collect Store Analyze
Tag & TransferRaw data to S3 Visualize
Amazon S3
AWS Glue Data Catalog
Amazon
EMR
Amazon
Redshift
AWS
Lambda
Microstrategy
AWS Glue ETL
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lessons Learned
Design Considerations
• Segregate workload/application into separate clusters
• Scale up & down as needed by each application
Analyze tables regularly
• Every single load for 'PREDICATE COLUMNS’
• Weekly for all columns
• Query SVV_TABLE_INFO(stats_off) to trigger ANALYZE
Vacuum tables regularly
• Daily vacuum on frequently accessed/modified table(STL_SCAN & STL_DELETE)
• Weekly on all tables
• Deep copy might be faster for high percent unsorted
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lessons Learned (cont.)
Commit Queue
• Amazon Redshift has higher commit overhead
• ETL tool SQL push-down was creating too many commits. Had to work with
vendor to optimize the commits
• Optimize ETL design to batch up commits
Schema Design
• Choose the best DIST KEY and SORT KEY
• Create Small tables as DIST ALL to optimize the table size and join performance
• Sort Keys: Avoid having interleaved sort keys on frequently ETL’d tables
• Sort Keys: Avoid compressing primary sort keys
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lessons Learned (cont.)
Close engagement with AWS, ProServe, and APN Partners
• Close partnering with our AWS account/support team, AWS product teams, AWS
ProServe, and AWS ISV and SI partners allowed us to quickly address any issues
that came up during migration
• AWS ProServe brought deep experience and expertise to help accelerate our success
WLM:
• Dynamic WLM setting for resource prioritization and allocation (daytime vs. nighttime
WLM settings)
• Queue jobs at ETL and Reporting tool level to avoid submitting too many queries to
Amazon Redshift at once.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of migrating to Amazon Redshift
Cost
saved 15-20% of annual cost
21CF studios data center
space reduction
Performance
30-35% performance gain
Business Agility
• Streamlined
provisioning of new gear
• No longer have to deal
with storage “wall”
• Improved
interoperability with
native AWS services
across multiple business
units, leveraging our
new data lake
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Looking ahead
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Summarizing New Feature Benefits
Amazon
Redshift’s new
DC2 node
• Read/write
performance is
~50% higher
than dc1
Short Query
Acceleration
• Helped smooth out
our overall
performance and
saw gains ~50%
with MicroStrategy
(MSTR) client
Redshift
Spectrum
• Allows us to extend the
reach of our Data Hub
to ’cold’ data stored in
Amazon S3
Query result set
caching (pending)
• Expected to yield 10x
improvement on
MSTR workload for
cached responses
(sub-second
latencies)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s next?
Continue building the Data Lake (AWS Glue, Amazon
Redshift Spectrum)
Stream data processing with Amazon Kinesis
Artificial Intelligence and Machine Learning
• Use Amazon Redshift as a training source
(Amazon Machine Learning, Spark)
• Natural language interfaces (Amazon Lex,
Amazon Polly)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
THANK YOU!

Mais conteúdo relacionado

Mais procurados

ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...Amazon Web Services
 
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...Amazon Web Services
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...Amazon Web Services
 
ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317Amazon Web Services
 
ABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsAmazon Web Services
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
FSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingFSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingAmazon Web Services
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Amazon Web Services
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Amazon Web Services
 
IOT313_AWS IoT and Machine Learning for Building Predictive Applications with...
IOT313_AWS IoT and Machine Learning for Building Predictive Applications with...IOT313_AWS IoT and Machine Learning for Building Predictive Applications with...
IOT313_AWS IoT and Machine Learning for Building Predictive Applications with...Amazon Web Services
 
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Amazon Web Services
 
ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both WorldsABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both WorldsAmazon Web Services
 
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
BigDL Deep Learning in Apache Spark - AWS re:invent 2017BigDL Deep Learning in Apache Spark - AWS re:invent 2017
BigDL Deep Learning in Apache Spark - AWS re:invent 2017Dave Nielsen
 
ABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS GlueABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS GlueAmazon Web Services
 
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017Amazon Web Services
 
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdfAMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdfAmazon Web Services
 
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...Amazon Web Services
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansAmazon Web Services
 

Mais procurados (20)

ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
 
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
 
ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317
 
ABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data Applications
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
FSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingFSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory Reporting
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)
 
IOT313_AWS IoT and Machine Learning for Building Predictive Applications with...
IOT313_AWS IoT and Machine Learning for Building Predictive Applications with...IOT313_AWS IoT and Machine Learning for Building Predictive Applications with...
IOT313_AWS IoT and Machine Learning for Building Predictive Applications with...
 
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both WorldsABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
 
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
BigDL Deep Learning in Apache Spark - AWS re:invent 2017BigDL Deep Learning in Apache Spark - AWS re:invent 2017
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
 
ABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS GlueABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS Glue
 
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
 
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdfAMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
 
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data Oceans
 

Semelhante a ABD327_Migrating Your Traditional Data Warehouse to a Modern Data Lake

BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Web Services
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Amazon Web Services
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Amazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftAmazon Web Services
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Amazon Web Services
 
Technology Trends in Data Processing - DAT311 - re:Invent 2017
Technology Trends in Data Processing - DAT311 - re:Invent 2017Technology Trends in Data Processing - DAT311 - re:Invent 2017
Technology Trends in Data Processing - DAT311 - re:Invent 2017Amazon Web Services
 
What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...
What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...
What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...Amazon Web Services
 
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumModernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumAmazon Web Services
 
AWS Database and Analytics State of the Union
AWS Database and Analytics State of the UnionAWS Database and Analytics State of the Union
AWS Database and Analytics State of the UnionAmazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with ZopaAmazon Web Services
 
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoImmersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoAmazon Web Services LATAM
 
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...Amazon Web Services
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Amazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Amazon Web Services
 

Semelhante a ABD327_Migrating Your Traditional Data Warehouse to a Modern Data Lake (20)

BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
 
Technology Trends in Data Processing - DAT311 - re:Invent 2017
Technology Trends in Data Processing - DAT311 - re:Invent 2017Technology Trends in Data Processing - DAT311 - re:Invent 2017
Technology Trends in Data Processing - DAT311 - re:Invent 2017
 
What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...
What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...
What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...
 
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumModernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
 
AWS Database and Analytics State of the Union
AWS Database and Analytics State of the UnionAWS Database and Analytics State of the Union
AWS Database and Analytics State of the Union
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
 
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoImmersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
 
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

ABD327_Migrating Your Traditional Data Warehouse to a Modern Data Lake

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:Invent Migrating Your Traditional Data Warehouse to a Modern Data Lake Vidhya Srinivasan, General Manager, Amazon Redshift Balaji Muthuramalingam, Executive Director, Data & Analytics at 21st Century Fox November 28, 2017
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon’s Analytics Architecture Collect Store Analyze Amazon Kinesis Firehose AWS Direct Connect Amazon Snowball Amazon Kinesis Analytics Amazon Kinesis Streams Amazon S3 Amazon Glacier Amazon CloudSearch Amazon RDS, Amazon Aurora Amazon DynamoDB Amazon ES Amazon EMR Amazon Redshift Amazon QuickSight AWS Database Migration Service AWS Glue Amazon Athena Amazon AI
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change. “Amazon Redshift has the largest adoption of BDW in the cloud.” “With more than 5,000 deployments, Amazon Redshift has the largest data warehouse deployments in the cloud – some over 10 petabytes in size.” AWS received a score of 5/5 (the highest score possible) in the: customer base, market awareness, ability to execute, road map, support, and partners criteria Forrester Wave Big Data Warehouse Q2 2017
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift – Data Warehousing Fast, powerful, simple, and fully managed data warehouse at 1/10 the cost Massively parallel, scales from gigabytes to exabytes Fast at scale Columnar storage technology to improve I/O efficiency and scale query performance $ Inexpensive As low as $1,000 per terabyte per year, 1/10th the cost of traditional data warehouse solutions; start at $0.25 per hour Open file formats Secure Audit everything; encrypt data end-to-end; extensive certification and compliance Analyze optimized data formats on direct-attached disks, and all open data formats in Amazon S3
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Redshift Spectrum E x t e n d t h e d a t a w a r e h o u s e t o y o u r S 3 d a t a l a k e S3 data lakeAmazon Redshift data Redshift Spectrum query engine Exabyte Amazon Redshift SQL queries against S3 Join data across Amazon Redshift and S3 Scale compute and storage separately Stable query performance and unlimited concurrency Parquet, ORC, Grok, Avro, & CSV data formats Pay only for the amount of data scanned
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Redshift Spectrum Q u e r y y o u r d a t a l a ke Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage AWS Glue Data Catalog Redshift Spectrum Scale-out serverless compute Query SELECT COUNT(*) FROM S3.EXT_TABLE GROUP BY …
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Spectrum: Exabyte query in less than three minutes SELECT P.ASIN, P.TITLE, R.POSTAL_CODE, P.RELEASE_DATE, SUM(D.QUANTITY * D.OUR_PRICE) AS SALES_sum FROM s3.d_customer_order_item_details D, asin_attributes A, products P, regions R WHERE D.ASIN = P.ASIN AND P.ASIN = A.ASIN AND D.REGION_ID = R.REGION_ID AND A.EDITION LIKE '%FIRST%' AND P.TITLE LIKE '%Potter%' AND P.AUTHOR = 'J. K. Rowling' AND R.COUNTRY_CODE = ‘US’ AND R.CITY = ‘Seattle’ AND R.STATE = ‘WA’ AND D.ORDER_DAY :: DATE >= P.RELEASE_DATE AND D.ORDER_DAY :: DATE < dateadd(day, 3, P.RELEASE_DATE) GROUP BY P.ASIN, P.TITLE, R.POSTAL_CODE, P.RELEASE_DATE ORDER BY SALES_sum DESC LIMIT 20; • Roughly 140 TB of customer item order detail records for each day over past 20 years • 190 million files across 15,000 partitions in S3 • One partition per day for USA and rest of world • Total data size is over an exabyte Optimization: • Compression ……………..….……..5X • Columnar file format……….......…10X • Scanning with 2500 nodes…....2500X • Static partition elimination…............2X • Dynamic partition elimination..….350X • Amazon Redshift query optimizer..40X Hive (1000 nodes) Redshift Spectrum 5 years 155 seconds * Estimated using 20 node Hive cluster & 1.4TB, assume linear * Query used a 20 node DC1.8XLarge Amazon Redshift cluster * Not actual sales data - generated for this demo based on data format used by Amazon Retail.
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. NUVIAD - Data Lake Analytics with Redshift Spectrum Seamlessly analyzing open file formats directly in Amazon S3 to provide fresh, up-to-the-minute insights Unlimited analytics and query concurrency with Amazon Redshift, and unlimited data capacity with Amazon S3 Scaling compute separately from storage in Amazon S3 for flexibility, fast performance and cost-effectiveness “Spectrum is a game changer for us. Reports that took minutes to produce are now delivered in seconds and we like the ability scale compute on-demand to query petabytes of data in S3 in various open file formats.”– Rafi Ton, CEO, NUVIAD NUVIAD is a mobile marketing platform providing professional marketers, agencies and local businesses with hyper-targeted analytics at petabyte scale AWS Glue Amazon S3 Data sources Amazon Redshift Redshift Spectrum BI Tools
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift is widely available Ireland Frankfurt London Beijing Mumbai Seoul Singapore Sydney Tokyo Sao Paulo US East – N Virginia US East – Ohio US West – Oregon US West – N California GovCloud Canada – Central, Montreal Currently Available
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Selected Amazon Redshift Customers
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Selected Amazon Redshift Partners Data Integration Systems IntegratorsBusiness Intelligence
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Recent and upcoming launches
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. New Dense Compute Node - DC2 2X Performance @ Same Price as DC1 3x more I/O with 30% better storage utilization than DC1 “We saw a 9x reduction in month-end reporting time with Amazon Redshift dc2 nodes as compared to dc1” - Bradley Todd, Technical Architect, Liberty Mutual NVMe SSD DDR4 Memory Intel E5-2686 v4 (Broadwell)
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. BI / Dashboard tools Analytics and Amazon Redshift Queries go to leader node 1 If cache contains query result, it’s returned with no processing 2 If query is not in cache, it’s executed and result is cached 3 • In-memory leader node cache, resulting in sub-second response • Transparent – it just works • Skip WLM, Skip processing, Skip optimization • Cache persists across sessions • Caching frees up the Amazon Redshift cluster, increasing performance for other non- repetitive queries RESULTS CACHE QUERY_ID RESULT QUERY_ID RESULT Result Caching - Sub-second query response times
  • 15. Result Caching: From the lab • Higher is better! (Queries per hour) • Read-write workload with a mix of small and large queries, Inserts, Copy and Vacuum • 4-node ds2.8xL cluster Dashboard Heavy Reporting 138 8 2979 117 QUERY THROUGHPUT (QPH) WITH RESULT CACHING No Caching Caching
  • 16. Result Caching: A customer perspective • Lower is better! (Query Latency) • 4-node dc2.8xL cluster • Tableau dashboard; 10-user test Caching No Caching “That’s not a mistake...the results for average execution time on the caching test run were sub-second and so don't show up on the y-axis at this scale” Various dashboard queries (names removed for confidentiality)
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. S h or t Q u e ry A c c e lerati on – E x p re s s l a ne f o r S h or t q u e rie s BI / Dashboard tools Analytics and Amazon Redshift • Short queries do not get stuck behind long running queries • Higher throughput, less variability • Customized for your workload • Transparent – it just works! Machine Learning Classifier
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Short Query Acceleration: Results No SQA; 5 concurrency SQA; 5 concurrency “This configuration showed a distinct improvement in short query runtimes with the SQA feature enabled. Many of the shortest queries saw a 5x or greater improvement while the longer running queries saw a corresponding increase. This is exactly how we expect the feature to work.”  Average wait time reduces from 36 seconds to 0 for queries that execute under a second  P90 wait time on a very busy cluster reduces from 370 seconds to 32.1 seconds
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Coming soon: Nested data support • Analyze nested and semi-structured data in Amazon S3 with Spectrum • Allows easy ETL of nested data in to Amazon Redshift using CTAS • Support for open file formats: Parquet, ORC, JSON, Ion and AVRO • Uses dot notation to extend your existing SQL s3data.clickStream: << { “session_time”: “20171013 14:05:00”, “clicks”: [ {“page”: “/home”, “referrer”: “”}, {“page”: “/products”, “referrer”: “/home”} ] }, { “session_time”: “20171013 14:06:00”, “clicks”: [ {“page”: “/contact”, “referrer”: “/home”} ] } >> SELECT c.page, COUNT(*) AS count FROM s3data.clickStream s, s.clicks c WHERE s.session_time > ‘2017-10-01 00:00:00’ AND c.referrer = “/home” GROUP BY c.page; Example: Find click frequency for links on “/home”:
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Coming soon: Nested data support Improve query performance by analyzing nested data OrderID CustomerID OrderTime ShipMode 5 23 10.00 12.50 8 32 1.00 5.60 OrdersWithItems ItemID Quantity Price 23 10.00 12.50 16 1.00 1.99 32 1.00 5.60 24 5.00 26.50 OrderItems OrderID ItemID Quantity Price 5 23 10.00 12.50 8 32 1.00 5.60 5 16 1.00 1.99 8 24 5.00 26.50 OrderID CustomerID OrderTime ShipMode 5 23 10.00 12.50 8 32 1.00 5.60 Orders OrderItems To improve query performance, the new Orders table includes the OrdersWithItems as a nested column, eliminating join processing
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Coming Soon: Enhanced Monitoring Optimize your Amazon Redshift cluster for peak performance by using query throughput metrics Get greater insights into your cluster performance by accessing database and workload metrics Get alerts and notifications via Amazon SNS Monitor query latency and throughput to optimize your workload
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Closing thoughts • Increase performance by 2x, at the same price, by switching to DC2 • Redshift Spectrum extends your Amazon Redshift cluster to all of your data in S3, seamlessly, efficiently and cost-effectively • Query Monitoring rules, along with Short Query Acceleration and Result Caching, can significantly improve performance Please continue to provide us feedback at redshift-pm@amazon.com
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fox Film Entertainment 21st Century Fox Data Lake on AWS
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fox Film Entertainment (21CF) CUSTOMER PLATFORMS BROADCASTERS PLATFORMS
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Landscape +100 TB Data +150 Billion Rows +100 Sources +25,000 User queries per day +35,000 Data process per day 24x7 All Regions
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Faster time to market Variety & Volume High Availability Stability Challenges Automation Technology & Beyond
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Key Principles Data Democratization Cloud First Faster Time to Market Scale to Grow Total Cost of Ownership
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fox–AWS Architecture Collect Store Analyze Data Transfer Scheduled Ingest Data Lake (Object Storage) EDW/DM (SQL MPP) ETL E(L)T Spark Visualize & Analysis Catalog, Management, Security
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Services Collect Store Analyze Tag & TransferRaw data to S3 Visualize Amazon S3 AWS Glue Data Catalog Amazon EMR Amazon Redshift AWS Lambda Microstrategy AWS Glue ETL
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lessons Learned Design Considerations • Segregate workload/application into separate clusters • Scale up & down as needed by each application Analyze tables regularly • Every single load for 'PREDICATE COLUMNS’ • Weekly for all columns • Query SVV_TABLE_INFO(stats_off) to trigger ANALYZE Vacuum tables regularly • Daily vacuum on frequently accessed/modified table(STL_SCAN & STL_DELETE) • Weekly on all tables • Deep copy might be faster for high percent unsorted
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lessons Learned (cont.) Commit Queue • Amazon Redshift has higher commit overhead • ETL tool SQL push-down was creating too many commits. Had to work with vendor to optimize the commits • Optimize ETL design to batch up commits Schema Design • Choose the best DIST KEY and SORT KEY • Create Small tables as DIST ALL to optimize the table size and join performance • Sort Keys: Avoid having interleaved sort keys on frequently ETL’d tables • Sort Keys: Avoid compressing primary sort keys
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lessons Learned (cont.) Close engagement with AWS, ProServe, and APN Partners • Close partnering with our AWS account/support team, AWS product teams, AWS ProServe, and AWS ISV and SI partners allowed us to quickly address any issues that came up during migration • AWS ProServe brought deep experience and expertise to help accelerate our success WLM: • Dynamic WLM setting for resource prioritization and allocation (daytime vs. nighttime WLM settings) • Queue jobs at ETL and Reporting tool level to avoid submitting too many queries to Amazon Redshift at once.
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of migrating to Amazon Redshift Cost saved 15-20% of annual cost 21CF studios data center space reduction Performance 30-35% performance gain Business Agility • Streamlined provisioning of new gear • No longer have to deal with storage “wall” • Improved interoperability with native AWS services across multiple business units, leveraging our new data lake
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Looking ahead
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Summarizing New Feature Benefits Amazon Redshift’s new DC2 node • Read/write performance is ~50% higher than dc1 Short Query Acceleration • Helped smooth out our overall performance and saw gains ~50% with MicroStrategy (MSTR) client Redshift Spectrum • Allows us to extend the reach of our Data Hub to ’cold’ data stored in Amazon S3 Query result set caching (pending) • Expected to yield 10x improvement on MSTR workload for cached responses (sub-second latencies)
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s next? Continue building the Data Lake (AWS Glue, Amazon Redshift Spectrum) Stream data processing with Amazon Kinesis Artificial Intelligence and Machine Learning • Use Amazon Redshift as a training source (Amazon Machine Learning, Spark) • Natural language interfaces (Amazon Lex, Amazon Polly)
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. THANK YOU!