SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Modern Cloud Data Warehousing ft.
Intuit: Optimize Analytics Practices
A N T 2 0 2 - R 1
Maor Kleider
Principal Product Manager
Amazon Web Services
Jason Rhoades
Systems Architect
Intuit
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Raise your hand if you’re using
Amazon Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Databases and Analytics
Broad and deep portfolio, built for builders
AWS Marketplace
Redshift
Data warehousing
EMR
Hadoop + Spark
Athena
Interactive analytics
Kinesis Analytics
Real-time
Elasticsearch service
Operational Analytics
RDS
MySQL, PostgreSQL, MariaDB,
Oracle, SQL Server
Aurora
MySQL, PostgreSQL
QuickSight SageMaker
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
S3/Glacier
Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Data Pipeline | Direct Connect
Data Movement
AnalyticsDatabases
Business Intelligence & Machine Learning
Data Lake
Managed
Blockchain
Blockchain
Templates
Blockchain
Comprehend Rekognition Lex Transcribe DeepLens
250+ Solutions
730+ Database
solutions
600+ Analytics
solutions
25+ Blockchain
solutions
20+ Data lake
solutions
30+ solutions
RDS on VMWare
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data
every 5 years
There is more data
than people think.
years
live for
Data platforms need to
scalegrows
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
There are more
data types than
ever before.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hadoop Elasticsearch
There are more
ways to analyze data
than ever before.
Years ago
11 8 5 4
Presto Spark
Didn’t exist
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What does
data warehouse
modernization
mean? Easy to use
Extends to
your Data Lake
Don’t waste time on
menial administrative
tasks and maintenance
Directly analyze data
stored in your data lake
in open formats
Any scale of data,
workloads, and users
Dynamically scale up to
guarantee performance even
with unpredictable demands
and data volumes
Faster
time-to-insights
Consistently fast
performance, even with
thousands of concurrent
queries and users
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
Fastest
Get faster time-to-insight
for all types of analytics
workloads; powered by
machine learning, columnar
storage and MPP
Unlimited
scale
Extends your
Data Lake
1/10th
the cost
Dynamically scale up to
guarantee performance
even with unpredictable
analytical demands and
data volumes
Analyze data in the Amazon
S3 Data Lake in-place and in
open formats, together with
data loaded into Redshift’s
high performance SSDs
Start at $0.25 per hour,
save costs with automated
administration tasks and
eliminate business impact
due to downtime; as low as
$1,000 per terabyte per year
Fast, simple, cost-effective data
warehouse that can extend queries to your Data Lake
Analyze data in open formats
such as Parquet, ORC, and JSON, using SQL tools
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
for their cloud
data warehouse
workloads than
anyone else
Amazon
Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Selected Amazon Redshift Partners
Data Integration Business Intelligence Systems Integrators
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
The 4 things that matter most
Speed Scale SecuritySimplicity
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s dig into what we’ve done
in the past several months
and what’s coming…
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
features and
enhancements
released*
Amazon Redshift is growing fast and innovating faster
Automatically enabled
short query acceleration
Support for lateral column
alias reference
New Quick Starts
New CloudWatch metrics
Customized
Recommendations
with Advisor
Current and trailing tracks
for release update
Federated authentication
with single sign-on
Improved performance
for commits
COPY from Parquet and
ORC file formats
Additional Spectrum regions
Support for Scalar JSON
and Ion data types
Late materialization for
faster query processing
Support for DATE data
type with Spectrum
Short Query
Acceleration
Utilization reports
Machine learning integration
to accelerate dashboards
and interactive analysis
Improved resource
management for
memory-intensive queries
Faster string manipulation
Support for Parquet and
ORC in Kinesis Data Firehose
Improved workload
management console
experience
Query Editor
Support for late-binding views
SQL Scalar user-defined
functions
Integration with AWS Glue
Support for Nested
Data with Spectrum
Spectrum support
for DATE data type
Improved performance
for UNION ALL queries
Free upgrade from
DC1 to DC2 RIs
Query monitoring rules (QMR)
Support for Zstandard high
compression encoding
Query processing
improvements
Support for Python
UDF logging module
Enhanced VPC routing
Automatically hopping
queries without restarts
Support for uppercase
column names
Result Caching for
Repeat Queries
Support for LISTAGG DISTINCT
Support for ORC and
Grok file formats
Integration with QuickSight
DMS support with Redshift
3.5x Improved
Throughput
Improved performance
for repeat queries
Since we last spoke…
*since re:Invent 2017
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
is now >3x
faster than
6 months ago
Normalized Queries Per Hour (QPH)
Assuming Redshift’s QPH 6 months ago=100%
Queriesperhour
Asa%ofRedshift6monthsago
JUL 2018 AUG 2018 SEP 2018 OCT 2018MAY 2018
100%
181%
237%
284%
350%
Higher is better
115%
JUN 2018
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
*Since re:Invent 2017
Compiled code cache
Support for lateral
column alias reference
Resource management for
memory-intensive queries
Late materialization
Result caching
Joins involving large numbers of
NULL values in a join key column
Queries with intermediate subquery
results that can be distributed
Cluster
resize operations
Queries that refer to stable
functions with constant expressions
Short query
acceleration
Queries operating over CHAR
and VARCHAR columns
Single-row inserts
Improvements to speed
Expressions on the partition
columns of external tablesFaster string manipulation
Complex EXCEPT
subqueries
Commit processing
enhancements
DC2 nodes
2x the number of tables
in a cluster
Hash join memory utilization
optimizations and cache line
prefetching
COPY operation when
ingesting data from Parquet
and ORC formats
Performance improvement for
queries that refer to stable functions
over constant expressions
Improvements for the COPY
operation when ingesting data
from Parquet and ORC formats
Query processing
improvements
Query rewrites that pushdown selective joins
into a subquery
Query planning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How we leverage fleet telemetry
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Performance
improvements
in query speed
- Minero Aoki
Senior Data Engineer, Cookpad Inc.
Redshift query performance and
scalability has been increasing,
even though our data has
grown. In the last 10 months, we
have seen commit performance
increase by 500% without any
increase in cost.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“20 percent of our queries now
complete in less than one second.
Best of all, we didn’t have to
change anything to get this
speed-up with Redshift, which
supports our mission-critical
workloads.”
-Greg Rokita,
Executive Director of Technology, Edmunds
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift Elastic Resize (GA)
Adds
additional
nodes
to Redshift
cluster
Distributes
data
across new
configuration
Minimal
transition time
Quickly scale
for varying
workload
demands
Scale up and down in minutes
New!
Redshift
Cluster
Redshift Managed S3
JDBC/ODBC
Leader Node
Backup
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Caching Layer
Concurrency Scaling for bursts of user activity (Preview)
Creates
more
clusters
automatically
on-demand
Consistently
fast
performance
even with
thousands of
concurrent queries
No
advance
hydration
required
Handles
unpredictable
demand variability
New!
Backup
Redshift Managed S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
For every 24 hours
that your main cluster
is in use, you accrue a
one-hour credit for
Concurrency Scaling.
Concurrency Scaling is
free for more than 97%
of Redshift customers.
Concurrency Scaling for bursts of user activity (Preview) New!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
*Since re:Invent 2017
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Improvements
to simplicity
CloudWatch metrics for
Workload Execution
Breakdown
Current and trailing tracks
for release updates
Lateral column
alias reference
CloudWatch metrics
for Query Duration
by WLM Queues
Cluster resize operations
CloudWatch
Query Runtime Breakdown metric
Stream real-time data in
Parquet or ORC formats
using Kinesis Data Firehose
DISTSTYLE AUTO
distribution style
Free upgrade from for DC1
RIs to DC2
Query Monitoring Rules (QMR)
now support 3x more rules
Short query
acceleration is
self-optimizing
Redshift Advisor for best
practice recommendationsCloudWatch metrics
for Query Throughput
by WLM Queues
Cluster resize Query Editor
Enhancements to
VACUUM DELETE
Manage components
of a multi-part query
in the AWS console
Automatic vacuum delete
Efficiency of backup performance
CloudWatch metrics for Query
Throughput, Query Duration
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redshift Query Editor
Query data
directly from
the AWS Console
Results are instantly visible
within the console
No need to install
an external JDBC/ODBC client
Launched in October!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift intelligent maintenance
VacuumAnalyze WLM
Concurrency
Setting
AutoAuto Auto
Maintenance processes like
vacuum and analyze will
automatically run in the
background.
Redshift will automatically adjust the
WLM concurrency setting to deliver
optimal throughput.
Moving towards
zero-maintenance.
Coming Soon!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
*Since re:Invent 2017
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Improvements to scale
Integrate seamlessly with your data lake
DATE data type
Retrieving metadata for late-binding
viewsSupport for Enhanced VPC Routing
IN-list predicate processing
in Spectrum scans
Query external tables
during a resize operation
Specify the root of an
S3 bucket as the source
for an existing table
Spectrum queries with
aggregations on partition columns
Renaming external
table columns
Table property to specify the file
compression type for external tables
Push the LENGTH()
string function to
Spectrum
ALTER TABLE ADD/DROP
COLUMN for external tables is now
supported via standard JDBC calls
Map datatypes in
Spectrum to contain
arrays
Support for Parquet, ORC, Avro,
CSV, and other open file formats
New Spectrum
regions
Spectrum support
for JSON and ION
Spectrum support
for nested data
Arrays of arrays and
arrays of maps
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift Spectrum
Redshift Spectrum
query engine
Query across
Redshift and S3
Redshift
data
S3
data lake
Extend the data warehouse to exabytes of data in Amazon S3 Data Lake
No data loading required
Scale compute and storage separately
Directly query data stored in Amazon S3
Parquet, ORC, Avro, JSON, and CSV data formats
 Unload to Parquet
 Spectrum Request Accelerator
Coming
Soon!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift is Scalable
Redshift Spectrum: Exabyte data lake query in under three minutes
Compression
Columnar file format
Scanning with 2500 nodes
Static partition elimination
Dynamic partition elimination
Amazon Redshift query optimizer
* Query used a 20 node DC1.8XLarge Amazon Redshift cluster
* Not actual sales data—generated for this demo based on data format used by Amazon Retail.
Imagine you are the manager at a Seattle book store.
An author released her 8th book in a popular series,
and you need to figure out how many copies to order.
Amazon S3
Redshift Spectrum
<3 minutes
5X
10X
2,500X
2X
350X
40X
Roughly 140 terabytes of customer
item order detail records for each
day over the past 20 years
190 million files across 15,000
partitions in S3
One partition per day for USA and
rest of world
Total data size is over an exabyte
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Security is built-in
Compliance certifications
10 GigE (HPC)
Customer
VPC
Internal
VPC
JDBC/ODBC
Compute
Nodes
Leader
Node
End-to-end encryption
Integration with AWS Key
Management Service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The power of data lakes
Most ways to bring data in
Terabyte – Exabyte scale
Security
compliance, and audit capabilities
Run any analytics
on the same data without movement
Scale
storage and compute independently
Designed for low-cost
storage and analytics
Redshift
EMR Athena
AI Services
ElasticsearchKinesis
Snowball
Kinesis
Video Streams
Kinesis
Data
Streams
Kinesis
Data Firehose
Snowmobile
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Unload
to Parquet
Amazon Redshift
New features
Speed
Scale
WLM
Concurrency
Setting
Simplicity
Amazon Lake
Formation
integration
Security
Auto Data
Distribution
Deferred
Maintenance
Snapshot
Scheduler
Spectrum
Request
Accelerator
Auto data
distribution
Elastic
resize
Concurrency
Scaling
Improving
short query
acceleration
Auto-
vacuum
Auto-
analyze
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
From desktop software to web-scale SaaS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
From traditional datacenter to AWS
November 2014 - Intuit announces it’s going
“all-in” with AWS at re:Invent.
July 2018 – With the end of its transition in sight,
Intuit sells its major data center in Quincy, WA.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Intuit’s growing cloud business platform
Cloud cost optimization program
$100s of Thousands
Saved per day
$100s of Millions
Prepay under management
+4 Billion
Rows processed per day per node
Time
Progress
is ~70% migrated to AWS. Focus is shifting from migration speed to efficient operations and growth.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Our goals
Handle
explosive
growth in
data volume
Maximize
investment
in value-add,
not operations
Provide
deeper
insights faster,
fresher
Maintain
compliance
with SOX
regulations
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling challenges with our previous solution
M rows
per minute
(~1M steady)
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0
6/4/20177:15:25PM
6/6/20176:38:29PM
6/8/20178:13:48PM
6/11/201711:57:59PM
6/14/201710:14:47PM
6/20/20175:05:26PM
6/22/201712:01:01AM
6/24/201712:17:26AM
6/29/20171:04:50AM
7/3/20172:21:43AM
7/6/20174:24:01AM
7/7/20177:54:18PM
7/11/201710:27:38PM
7/14/20175:35:55AM
7/16/201710:42:34PM
7/19/201712:23:23AM
7/20/201712:33:25PM
7/23/201711:33:26PM
7/26/20176:18:38AM
7/31/20175:26:14AM
8/3/20175:47:56AM
8/4/20177:22:16PM
8/7/20178:15:04PM
8/9/20178:56:14PM
8/11/20179:53:35PM
8/18/20172:14:43PM
8/21/201711:12:11PM
8/23/201711:42:26PM
8/26/20172:18:56AM
8/29/20171:02:18AM
8/31/20171:37:50AM
9/2/20172:15:31AM
9/4/20177:10.35PM
9/7/20177:53:13PM
9/10/20178:46:45PM
9/13/20175:37:06AM
9/14/201710:09:33PM
9/18/201711:16:19PM
9/21/201712:22:45AM
9/23/201712:52:31AM
9/27/20171:43:52AM
9/29/20171:35:27AM
10/2/20173:25:10AM
10/4/20177:42:26PM
10/6/20178:33:58PM
10/11/20175:46:08AM
10/12/20179:39:50PM
10/15/201710:35:26PM
10/18/201711:11:24PM
10/21/201712:05:27AM
10/24/20171:15:09AM
10/26/20171:52:47AM
10/28/20173:47:06AM
10/31/20173:16:54AM
11/2/20173:54:31AM
Batch
duration
(Minutes)
Batch
size
(M rows)
Previous solution’s performance
was constant, not accommodating
increasing data volumes.
Scaling in the datacenter
took weeks and required significant
manual effort and cost.
1000
0
900
800
700
600
500
400
300
200
100
1.5
0
1
.5
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift performance scales to demand
Rows
per Minute
(Millions)
140
120
100
80
60
40
20
0
4500
0
4000
3500
3000
2500
2000
1500
1000
500
Batch
duration
(Minutes)
Rows
processed
(Millions)
10/1/20176:06AM
10/4/20178:01PM
10/8/20176:47PM
10/12/20175:30PM
10/17/20178:10AM
10/22/20173:16PM
10/27/20176:26AM
11/2/20174:12AM
11/5/201710:12AM
11/10/20173:19AM
11/16/201710:22AM
11/19/201710:50PM
11/23/201710:35PM
11/28/201710:53AM
12/2/20177:09PM
12/6/20172:45AM
12/10/20179:41AM
12/14/20176:34AM
12/17/201719:33PM
12/21/20173:57AM
12/25/20179:14PM
12/29/20179:53AM
1/3/20183:00AM
1/8/20184:35PM
1/12/20187:34PM
1/16/20183:04PM
1/21/201812:37AM
1/26/20188:19PM
2/5/20187:44PM
2/9/20189:28AM
2/13/201812:52PM
2/17/20184:40PM
2/21/20186:18PM
3/1/20185:13AM
3/6/20182:58AM
3/10/20183:28PM
3/14/20184:39PM
3/18/20184:40AM
3/22/20187:07AM
3/30/20187:15AM
4/4/20187:10PM
4/8/20188:50PM
4/14/20183:39PM
4/18/20185:48PM
4/30/20181:54PM
5/6/20183:22PM
5/11/20184:55PM
5/16/201810:16AM
5/28/20184:12PM
6/4/201811:44AM
6/10/201811:42AM
6/18/201811:19PM
6/25/20182:00AM
7/1/20182:17AM
7/6/20182:51AM
7/10/20187:09PM
7/14/20184:14PM
7/26/20185:57PM
8/4/20183:52PM
8/8/20181:04AM
8/12/20181:17PM
8/18/201812:08AM
8/22/20187:25PM
8/30/20185:02AM
9/4/20182:50AM
9/7/20185:00PM
9/13/20187:02PM
9/18/20181:48AM
9/25/20186:25AM
9/29/201812:24AM
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Intuit cloud business architecture
Data
Demarcation
Downstream
Consumers
AI/ML
Visualizations
Business
Intelligence
Stage Process Consume
Data Platform
Ingestion Processing Platform Processing Platform
Orchestration Layer
API
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Intuit cloud business services
Amazon
S3
Amazon
S3
Amazon
SageMaker
Amazon
QuickSight
Amazon
CloudWatch
Amazon
RDS
AWS Step
Functions
Amazon
SNS
AWS
Lambda
Amazon
EC2
AWS
Lambda
Amazon
EC2
AWS
Lambda
Stage Process Consume
Amazon Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lessons learned – orchestration applications
Alternative warehouses
(MS SQL, Oracle) provide all-in-one database
application development platforms
AWS provides
an extensive collection of services to
supplement Amazon Redshift
The absence of system-native workflows can be intimidating at first.
However, the broad collection of low-overheard compute, storage, and application
development services provided by AWS allow for higher performing, more scalable,
and lower cost solutions than previously possible.
Amazon
S3
AWS
Snowball*
AWS
Batch
AWS
Lambda
Amazon
EC2
Amazon
RDS
AWS
DMS
Amazon
CloudWatch
AWS
CloudTrail
AWS
Glue
Amazon
Kinesis
Amazon
EMR
AWS Step
Functions
… and
many
more!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lessons learned
By linking Amazon Redshift with RDS PostgreSQL, the combined feature set can power a
broader array of use cases and provide the best solution for each task.
Amazon Redshift
Fast, simple, cost-effective data
warehouse that can extend queries to
your data lake.
Redshift strengths:
High performance against large data sets
Easily scaled MPP Platform
Fast, simple ingestion from Amazon S3
PostgreSQL
Amazon RDS PostgreSQL
instances provide strong affinity
to Redshift due to common
PostgreSQL code roots.
RDS PostgreSQL strengths:
Performance for many small writes
Stored Procedure support
Additional Postgres 9.x features
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Next steps – Concurrency Scaling for key workloads
Redshift Concurrency Scaling is expected to provide consistently fast
performance for our analysts, even with thousands of concurrent queries. All of
this with a minimal-to-no additional cost.
Our platform’s next stage of intelligence and optimization will be
derived from AI/ML applied against our data.
Query patterns that are more complex and less predictable might increase the chances of
concurrency conflicts with our key automated jobs.
Further opening the system to internal data science
teams means increasing Redshift analyst user base
several-fold.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Next steps – scaling with Amazon Redshift
Unload historical data
from our largest tables
to Amazon S3 Using
Unload to Parquet.
Transparently query
unloaded Amazon S3 data
with Redshift-resident
data using Redshift
Spectrum.
Performance excels for
infrequently accessed
data when Parquet’s
columnar format is
combined with the
Redshift Spectrum
Request Accelerator.
These three features in concert allow one to seamlessly scale data outside of Amazon Redshift,
increasing flexibility of storage and compute provisioning. Specifically, it will allow us to age
older data out to S3, while keeping its retrieval seamless and performant.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift migration benefits
Performance
& scaling
Architecture has scaled
over 7x data volume with
no effort on our end
>20x hardware-
normalized performance
with large batches
Cost
66% reduction in
operations overhead
more than offsets slight
Opex increase
Business outcomes
>90% reduction in time-
to-insight
0 minutes of unscheduled
downtime
50% reduction in story
cycle time to implement
new features
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maor Kleider
Maor@amazon.com
Jason Rhoades
Jason_Rhoades@intuit.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Mais conteúdo relacionado

Mais procurados

[CTO Night & Day 2019] AWS で構築するデータレイク基盤と amazon.com での導入事例 #ctonight
[CTO Night & Day 2019] AWS で構築するデータレイク基盤と amazon.com での導入事例 #ctonight[CTO Night & Day 2019] AWS で構築するデータレイク基盤と amazon.com での導入事例 #ctonight
[CTO Night & Day 2019] AWS で構築するデータレイク基盤と amazon.com での導入事例 #ctonightAmazon Web Services Japan
 
20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...
20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...
20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...Amazon Web Services Japan
 
Using AWS Control Tower to govern multi-account AWS environments at scale - G...
Using AWS Control Tower to govern multi-account AWS environments at scale - G...Using AWS Control Tower to govern multi-account AWS environments at scale - G...
Using AWS Control Tower to govern multi-account AWS environments at scale - G...Amazon Web Services
 
AWS Black Belt Tech シリーズ 2015 - AWS Data Pipeline
AWS Black Belt Tech シリーズ 2015 - AWS Data PipelineAWS Black Belt Tech シリーズ 2015 - AWS Data Pipeline
AWS Black Belt Tech シリーズ 2015 - AWS Data PipelineAmazon Web Services Japan
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Amazon Web Services
 
AWS Black Belt Online Seminar 2017 AWSにおけるアプリ認証パターンのご紹介
AWS Black Belt Online Seminar 2017 AWSにおけるアプリ認証パターンのご紹介AWS Black Belt Online Seminar 2017 AWSにおけるアプリ認証パターンのご紹介
AWS Black Belt Online Seminar 2017 AWSにおけるアプリ認証パターンのご紹介Amazon Web Services Japan
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)Amazon Web Services Korea
 
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch ServiceAmazon Web Services Japan
 
Disaster Recovery with the AWS Cloud
Disaster Recovery with the AWS CloudDisaster Recovery with the AWS Cloud
Disaster Recovery with the AWS CloudAmazon Web Services
 
20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier
20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier
20190220 AWS Black Belt Online Seminar Amazon S3 / GlacierAmazon Web Services Japan
 
20190919 よくご相談いただくセキュリティの質問と考え方
20190919 よくご相談いただくセキュリティの質問と考え方20190919 よくご相談いただくセキュリティの質問と考え方
20190919 よくご相談いただくセキュリティの質問と考え方Amazon Web Services Japan
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...Amazon Web Services
 

Mais procurados (20)

[CTO Night & Day 2019] AWS で構築するデータレイク基盤と amazon.com での導入事例 #ctonight
[CTO Night & Day 2019] AWS で構築するデータレイク基盤と amazon.com での導入事例 #ctonight[CTO Night & Day 2019] AWS で構築するデータレイク基盤と amazon.com での導入事例 #ctonight
[CTO Night & Day 2019] AWS で構築するデータレイク基盤と amazon.com での導入事例 #ctonight
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...
20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...
20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...
 
Using AWS Control Tower to govern multi-account AWS environments at scale - G...
Using AWS Control Tower to govern multi-account AWS environments at scale - G...Using AWS Control Tower to govern multi-account AWS environments at scale - G...
Using AWS Control Tower to govern multi-account AWS environments at scale - G...
 
AWS Black Belt Tech シリーズ 2015 - AWS Data Pipeline
AWS Black Belt Tech シリーズ 2015 - AWS Data PipelineAWS Black Belt Tech シリーズ 2015 - AWS Data Pipeline
AWS Black Belt Tech シリーズ 2015 - AWS Data Pipeline
 
Optimize Cost Efficiency on AWS
Optimize Cost Efficiency on AWSOptimize Cost Efficiency on AWS
Optimize Cost Efficiency on AWS
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
 
AWS Black Belt Online Seminar 2017 AWSにおけるアプリ認証パターンのご紹介
AWS Black Belt Online Seminar 2017 AWSにおけるアプリ認証パターンのご紹介AWS Black Belt Online Seminar 2017 AWSにおけるアプリ認証パターンのご紹介
AWS Black Belt Online Seminar 2017 AWSにおけるアプリ認証パターンのご紹介
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
 
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
 
Disaster Recovery with the AWS Cloud
Disaster Recovery with the AWS CloudDisaster Recovery with the AWS Cloud
Disaster Recovery with the AWS Cloud
 
20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier
20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier
20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier
 
20190919 よくご相談いただくセキュリティの質問と考え方
20190919 よくご相談いただくセキュリティの質問と考え方20190919 よくご相談いただくセキュリティの質問と考え方
20190919 よくご相談いただくセキュリティの質問と考え方
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
ここから始めるAWSセキュリティ
ここから始めるAWSセキュリティここから始めるAWSセキュリティ
ここから始めるAWSセキュリティ
 
AWS Black Belt online seminar 2017 Snowball
AWS Black Belt online seminar 2017 SnowballAWS Black Belt online seminar 2017 Snowball
AWS Black Belt online seminar 2017 Snowball
 
Fundamentals of Cloud Computing & AWS
Fundamentals of Cloud Computing & AWSFundamentals of Cloud Computing & AWS
Fundamentals of Cloud Computing & AWS
 
Application Migrations
Application MigrationsApplication Migrations
Application Migrations
 
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
 
AWS Cost Management Workshop
AWS Cost Management WorkshopAWS Cost Management Workshop
AWS Cost Management Workshop
 

Semelhante a Modern Cloud Data Warehousing ft. Intuit: Optimize Analytics Practices (ANT202-R1) - AWS re:Invent 2018

Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Amazon Web Services
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Web Services
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018Amazon Web Services
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftAmazon Web Services
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Amazon Web Services
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Amazon Web Services
 
Choosing the Right Database for My Workload: Purpose-Built Databases
Choosing the Right Database for My Workload: Purpose-Built Databases Choosing the Right Database for My Workload: Purpose-Built Databases
Choosing the Right Database for My Workload: Purpose-Built Databases AWS Germany
 
Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsAmazon Web Services
 
Data Warehousing in the Cloud - AWS Summit Sydney
Data Warehousing in the Cloud - AWS Summit SydneyData Warehousing in the Cloud - AWS Summit Sydney
Data Warehousing in the Cloud - AWS Summit SydneyAmazon Web Services
 
Choose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day IsraelChoose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day IsraelAmazon Web Services
 
What's New with Amazon Redshift - ADB202 - Anaheim AWS Summit
What's New with Amazon Redshift - ADB202 - Anaheim AWS SummitWhat's New with Amazon Redshift - ADB202 - Anaheim AWS Summit
What's New with Amazon Redshift - ADB202 - Anaheim AWS SummitAmazon Web Services
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeAmazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Amazon Web Services
 
What’s new with Amazon Redshift, featuring ZS Associates - ADB205 - Chicago A...
What’s new with Amazon Redshift, featuring ZS Associates - ADB205 - Chicago A...What’s new with Amazon Redshift, featuring ZS Associates - ADB205 - Chicago A...
What’s new with Amazon Redshift, featuring ZS Associates - ADB205 - Chicago A...Amazon Web Services
 

Semelhante a Modern Cloud Data Warehousing ft. Intuit: Optimize Analytics Practices (ANT202-R1) - AWS re:Invent 2018 (20)

Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
Choosing the Right Database for My Workload: Purpose-Built Databases
Choosing the Right Database for My Workload: Purpose-Built Databases Choosing the Right Database for My Workload: Purpose-Built Databases
Choosing the Right Database for My Workload: Purpose-Built Databases
 
Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your Applications
 
Data Warehousing in the Cloud - AWS Summit Sydney
Data Warehousing in the Cloud - AWS Summit SydneyData Warehousing in the Cloud - AWS Summit Sydney
Data Warehousing in the Cloud - AWS Summit Sydney
 
Choose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day IsraelChoose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day Israel
 
AWS Database Services @ Scale
AWS Database Services @ ScaleAWS Database Services @ Scale
AWS Database Services @ Scale
 
What's New with Amazon Redshift - ADB202 - Anaheim AWS Summit
What's New with Amazon Redshift - ADB202 - Anaheim AWS SummitWhat's New with Amazon Redshift - ADB202 - Anaheim AWS Summit
What's New with Amazon Redshift - ADB202 - Anaheim AWS Summit
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Big Data@Scale
 Big Data@Scale Big Data@Scale
Big Data@Scale
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
What’s new with Amazon Redshift, featuring ZS Associates - ADB205 - Chicago A...
What’s new with Amazon Redshift, featuring ZS Associates - ADB205 - Chicago A...What’s new with Amazon Redshift, featuring ZS Associates - ADB205 - Chicago A...
What’s new with Amazon Redshift, featuring ZS Associates - ADB205 - Chicago A...
 
Managed Relational Databases
Managed Relational DatabasesManaged Relational Databases
Managed Relational Databases
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Modern Cloud Data Warehousing ft. Intuit: Optimize Analytics Practices (ANT202-R1) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Modern Cloud Data Warehousing ft. Intuit: Optimize Analytics Practices A N T 2 0 2 - R 1 Maor Kleider Principal Product Manager Amazon Web Services Jason Rhoades Systems Architect Intuit
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Raise your hand if you’re using Amazon Redshift © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Databases and Analytics Broad and deep portfolio, built for builders AWS Marketplace Redshift Data warehousing EMR Hadoop + Spark Athena Interactive analytics Kinesis Analytics Real-time Elasticsearch service Operational Analytics RDS MySQL, PostgreSQL, MariaDB, Oracle, SQL Server Aurora MySQL, PostgreSQL QuickSight SageMaker DynamoDB Key value, Document ElastiCache Redis, Memcached Neptune Graph Timestream Time Series QLDB Ledger Database S3/Glacier Glue ETL & Data Catalog Lake Formation Data Lakes Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Data Pipeline | Direct Connect Data Movement AnalyticsDatabases Business Intelligence & Machine Learning Data Lake Managed Blockchain Blockchain Templates Blockchain Comprehend Rekognition Lex Transcribe DeepLens 250+ Solutions 730+ Database solutions 600+ Analytics solutions 25+ Blockchain solutions 20+ Data lake solutions 30+ solutions RDS on VMWare
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data every 5 years There is more data than people think. years live for Data platforms need to scalegrows
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. There are more data types than ever before.
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hadoop Elasticsearch There are more ways to analyze data than ever before. Years ago 11 8 5 4 Presto Spark Didn’t exist
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What does data warehouse modernization mean? Easy to use Extends to your Data Lake Don’t waste time on menial administrative tasks and maintenance Directly analyze data stored in your data lake in open formats Any scale of data, workloads, and users Dynamically scale up to guarantee performance even with unpredictable demands and data volumes Faster time-to-insights Consistently fast performance, even with thousands of concurrent queries and users
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift Fastest Get faster time-to-insight for all types of analytics workloads; powered by machine learning, columnar storage and MPP Unlimited scale Extends your Data Lake 1/10th the cost Dynamically scale up to guarantee performance even with unpredictable analytical demands and data volumes Analyze data in the Amazon S3 Data Lake in-place and in open formats, together with data loaded into Redshift’s high performance SSDs Start at $0.25 per hour, save costs with automated administration tasks and eliminate business impact due to downtime; as low as $1,000 per terabyte per year Fast, simple, cost-effective data warehouse that can extend queries to your Data Lake Analyze data in open formats such as Parquet, ORC, and JSON, using SQL tools
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. for their cloud data warehouse workloads than anyone else Amazon Redshift
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Selected Amazon Redshift Partners Data Integration Business Intelligence Systems Integrators
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift The 4 things that matter most Speed Scale SecuritySimplicity © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s dig into what we’ve done in the past several months and what’s coming…
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. features and enhancements released* Amazon Redshift is growing fast and innovating faster Automatically enabled short query acceleration Support for lateral column alias reference New Quick Starts New CloudWatch metrics Customized Recommendations with Advisor Current and trailing tracks for release update Federated authentication with single sign-on Improved performance for commits COPY from Parquet and ORC file formats Additional Spectrum regions Support for Scalar JSON and Ion data types Late materialization for faster query processing Support for DATE data type with Spectrum Short Query Acceleration Utilization reports Machine learning integration to accelerate dashboards and interactive analysis Improved resource management for memory-intensive queries Faster string manipulation Support for Parquet and ORC in Kinesis Data Firehose Improved workload management console experience Query Editor Support for late-binding views SQL Scalar user-defined functions Integration with AWS Glue Support for Nested Data with Spectrum Spectrum support for DATE data type Improved performance for UNION ALL queries Free upgrade from DC1 to DC2 RIs Query monitoring rules (QMR) Support for Zstandard high compression encoding Query processing improvements Support for Python UDF logging module Enhanced VPC routing Automatically hopping queries without restarts Support for uppercase column names Result Caching for Repeat Queries Support for LISTAGG DISTINCT Support for ORC and Grok file formats Integration with QuickSight DMS support with Redshift 3.5x Improved Throughput Improved performance for repeat queries Since we last spoke… *since re:Invent 2017
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift is now >3x faster than 6 months ago Normalized Queries Per Hour (QPH) Assuming Redshift’s QPH 6 months ago=100% Queriesperhour Asa%ofRedshift6monthsago JUL 2018 AUG 2018 SEP 2018 OCT 2018MAY 2018 100% 181% 237% 284% 350% Higher is better 115% JUN 2018
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. *Since re:Invent 2017 Compiled code cache Support for lateral column alias reference Resource management for memory-intensive queries Late materialization Result caching Joins involving large numbers of NULL values in a join key column Queries with intermediate subquery results that can be distributed Cluster resize operations Queries that refer to stable functions with constant expressions Short query acceleration Queries operating over CHAR and VARCHAR columns Single-row inserts Improvements to speed Expressions on the partition columns of external tablesFaster string manipulation Complex EXCEPT subqueries Commit processing enhancements DC2 nodes 2x the number of tables in a cluster Hash join memory utilization optimizations and cache line prefetching COPY operation when ingesting data from Parquet and ORC formats Performance improvement for queries that refer to stable functions over constant expressions Improvements for the COPY operation when ingesting data from Parquet and ORC formats Query processing improvements Query rewrites that pushdown selective joins into a subquery Query planning © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How we leverage fleet telemetry
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Performance improvements in query speed - Minero Aoki Senior Data Engineer, Cookpad Inc. Redshift query performance and scalability has been increasing, even though our data has grown. In the last 10 months, we have seen commit performance increase by 500% without any increase in cost.
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. “20 percent of our queries now complete in less than one second. Best of all, we didn’t have to change anything to get this speed-up with Redshift, which supports our mission-critical workloads.” -Greg Rokita, Executive Director of Technology, Edmunds
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift Elastic Resize (GA) Adds additional nodes to Redshift cluster Distributes data across new configuration Minimal transition time Quickly scale for varying workload demands Scale up and down in minutes New! Redshift Cluster Redshift Managed S3 JDBC/ODBC Leader Node Backup
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Caching Layer Concurrency Scaling for bursts of user activity (Preview) Creates more clusters automatically on-demand Consistently fast performance even with thousands of concurrent queries No advance hydration required Handles unpredictable demand variability New! Backup Redshift Managed S3
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. For every 24 hours that your main cluster is in use, you accrue a one-hour credit for Concurrency Scaling. Concurrency Scaling is free for more than 97% of Redshift customers. Concurrency Scaling for bursts of user activity (Preview) New!
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. *Since re:Invent 2017 © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Improvements to simplicity CloudWatch metrics for Workload Execution Breakdown Current and trailing tracks for release updates Lateral column alias reference CloudWatch metrics for Query Duration by WLM Queues Cluster resize operations CloudWatch Query Runtime Breakdown metric Stream real-time data in Parquet or ORC formats using Kinesis Data Firehose DISTSTYLE AUTO distribution style Free upgrade from for DC1 RIs to DC2 Query Monitoring Rules (QMR) now support 3x more rules Short query acceleration is self-optimizing Redshift Advisor for best practice recommendationsCloudWatch metrics for Query Throughput by WLM Queues Cluster resize Query Editor Enhancements to VACUUM DELETE Manage components of a multi-part query in the AWS console Automatic vacuum delete Efficiency of backup performance CloudWatch metrics for Query Throughput, Query Duration
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Redshift Query Editor Query data directly from the AWS Console Results are instantly visible within the console No need to install an external JDBC/ODBC client Launched in October!
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift intelligent maintenance VacuumAnalyze WLM Concurrency Setting AutoAuto Auto Maintenance processes like vacuum and analyze will automatically run in the background. Redshift will automatically adjust the WLM concurrency setting to deliver optimal throughput. Moving towards zero-maintenance. Coming Soon!
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. *Since re:Invent 2017 © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Improvements to scale Integrate seamlessly with your data lake DATE data type Retrieving metadata for late-binding viewsSupport for Enhanced VPC Routing IN-list predicate processing in Spectrum scans Query external tables during a resize operation Specify the root of an S3 bucket as the source for an existing table Spectrum queries with aggregations on partition columns Renaming external table columns Table property to specify the file compression type for external tables Push the LENGTH() string function to Spectrum ALTER TABLE ADD/DROP COLUMN for external tables is now supported via standard JDBC calls Map datatypes in Spectrum to contain arrays Support for Parquet, ORC, Avro, CSV, and other open file formats New Spectrum regions Spectrum support for JSON and ION Spectrum support for nested data Arrays of arrays and arrays of maps
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift Spectrum Redshift Spectrum query engine Query across Redshift and S3 Redshift data S3 data lake Extend the data warehouse to exabytes of data in Amazon S3 Data Lake No data loading required Scale compute and storage separately Directly query data stored in Amazon S3 Parquet, ORC, Avro, JSON, and CSV data formats  Unload to Parquet  Spectrum Request Accelerator Coming Soon!
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift is Scalable Redshift Spectrum: Exabyte data lake query in under three minutes Compression Columnar file format Scanning with 2500 nodes Static partition elimination Dynamic partition elimination Amazon Redshift query optimizer * Query used a 20 node DC1.8XLarge Amazon Redshift cluster * Not actual sales data—generated for this demo based on data format used by Amazon Retail. Imagine you are the manager at a Seattle book store. An author released her 8th book in a popular series, and you need to figure out how many copies to order. Amazon S3 Redshift Spectrum <3 minutes 5X 10X 2,500X 2X 350X 40X Roughly 140 terabytes of customer item order detail records for each day over the past 20 years 190 million files across 15,000 partitions in S3 One partition per day for USA and rest of world Total data size is over an exabyte
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Security is built-in Compliance certifications 10 GigE (HPC) Customer VPC Internal VPC JDBC/ODBC Compute Nodes Leader Node End-to-end encryption Integration with AWS Key Management Service
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The power of data lakes Most ways to bring data in Terabyte – Exabyte scale Security compliance, and audit capabilities Run any analytics on the same data without movement Scale storage and compute independently Designed for low-cost storage and analytics Redshift EMR Athena AI Services ElasticsearchKinesis Snowball Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Snowmobile
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Unload to Parquet Amazon Redshift New features Speed Scale WLM Concurrency Setting Simplicity Amazon Lake Formation integration Security Auto Data Distribution Deferred Maintenance Snapshot Scheduler Spectrum Request Accelerator Auto data distribution Elastic resize Concurrency Scaling Improving short query acceleration Auto- vacuum Auto- analyze
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. From desktop software to web-scale SaaS
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. From traditional datacenter to AWS November 2014 - Intuit announces it’s going “all-in” with AWS at re:Invent. July 2018 – With the end of its transition in sight, Intuit sells its major data center in Quincy, WA.
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Intuit’s growing cloud business platform Cloud cost optimization program $100s of Thousands Saved per day $100s of Millions Prepay under management +4 Billion Rows processed per day per node Time Progress is ~70% migrated to AWS. Focus is shifting from migration speed to efficient operations and growth.
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Our goals Handle explosive growth in data volume Maximize investment in value-add, not operations Provide deeper insights faster, fresher Maintain compliance with SOX regulations
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scaling challenges with our previous solution M rows per minute (~1M steady) 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 6/4/20177:15:25PM 6/6/20176:38:29PM 6/8/20178:13:48PM 6/11/201711:57:59PM 6/14/201710:14:47PM 6/20/20175:05:26PM 6/22/201712:01:01AM 6/24/201712:17:26AM 6/29/20171:04:50AM 7/3/20172:21:43AM 7/6/20174:24:01AM 7/7/20177:54:18PM 7/11/201710:27:38PM 7/14/20175:35:55AM 7/16/201710:42:34PM 7/19/201712:23:23AM 7/20/201712:33:25PM 7/23/201711:33:26PM 7/26/20176:18:38AM 7/31/20175:26:14AM 8/3/20175:47:56AM 8/4/20177:22:16PM 8/7/20178:15:04PM 8/9/20178:56:14PM 8/11/20179:53:35PM 8/18/20172:14:43PM 8/21/201711:12:11PM 8/23/201711:42:26PM 8/26/20172:18:56AM 8/29/20171:02:18AM 8/31/20171:37:50AM 9/2/20172:15:31AM 9/4/20177:10.35PM 9/7/20177:53:13PM 9/10/20178:46:45PM 9/13/20175:37:06AM 9/14/201710:09:33PM 9/18/201711:16:19PM 9/21/201712:22:45AM 9/23/201712:52:31AM 9/27/20171:43:52AM 9/29/20171:35:27AM 10/2/20173:25:10AM 10/4/20177:42:26PM 10/6/20178:33:58PM 10/11/20175:46:08AM 10/12/20179:39:50PM 10/15/201710:35:26PM 10/18/201711:11:24PM 10/21/201712:05:27AM 10/24/20171:15:09AM 10/26/20171:52:47AM 10/28/20173:47:06AM 10/31/20173:16:54AM 11/2/20173:54:31AM Batch duration (Minutes) Batch size (M rows) Previous solution’s performance was constant, not accommodating increasing data volumes. Scaling in the datacenter took weeks and required significant manual effort and cost. 1000 0 900 800 700 600 500 400 300 200 100 1.5 0 1 .5
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift performance scales to demand Rows per Minute (Millions) 140 120 100 80 60 40 20 0 4500 0 4000 3500 3000 2500 2000 1500 1000 500 Batch duration (Minutes) Rows processed (Millions) 10/1/20176:06AM 10/4/20178:01PM 10/8/20176:47PM 10/12/20175:30PM 10/17/20178:10AM 10/22/20173:16PM 10/27/20176:26AM 11/2/20174:12AM 11/5/201710:12AM 11/10/20173:19AM 11/16/201710:22AM 11/19/201710:50PM 11/23/201710:35PM 11/28/201710:53AM 12/2/20177:09PM 12/6/20172:45AM 12/10/20179:41AM 12/14/20176:34AM 12/17/201719:33PM 12/21/20173:57AM 12/25/20179:14PM 12/29/20179:53AM 1/3/20183:00AM 1/8/20184:35PM 1/12/20187:34PM 1/16/20183:04PM 1/21/201812:37AM 1/26/20188:19PM 2/5/20187:44PM 2/9/20189:28AM 2/13/201812:52PM 2/17/20184:40PM 2/21/20186:18PM 3/1/20185:13AM 3/6/20182:58AM 3/10/20183:28PM 3/14/20184:39PM 3/18/20184:40AM 3/22/20187:07AM 3/30/20187:15AM 4/4/20187:10PM 4/8/20188:50PM 4/14/20183:39PM 4/18/20185:48PM 4/30/20181:54PM 5/6/20183:22PM 5/11/20184:55PM 5/16/201810:16AM 5/28/20184:12PM 6/4/201811:44AM 6/10/201811:42AM 6/18/201811:19PM 6/25/20182:00AM 7/1/20182:17AM 7/6/20182:51AM 7/10/20187:09PM 7/14/20184:14PM 7/26/20185:57PM 8/4/20183:52PM 8/8/20181:04AM 8/12/20181:17PM 8/18/201812:08AM 8/22/20187:25PM 8/30/20185:02AM 9/4/20182:50AM 9/7/20185:00PM 9/13/20187:02PM 9/18/20181:48AM 9/25/20186:25AM 9/29/201812:24AM
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Intuit cloud business architecture Data Demarcation Downstream Consumers AI/ML Visualizations Business Intelligence Stage Process Consume Data Platform Ingestion Processing Platform Processing Platform Orchestration Layer API
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Intuit cloud business services Amazon S3 Amazon S3 Amazon SageMaker Amazon QuickSight Amazon CloudWatch Amazon RDS AWS Step Functions Amazon SNS AWS Lambda Amazon EC2 AWS Lambda Amazon EC2 AWS Lambda Stage Process Consume Amazon Redshift
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lessons learned – orchestration applications Alternative warehouses (MS SQL, Oracle) provide all-in-one database application development platforms AWS provides an extensive collection of services to supplement Amazon Redshift The absence of system-native workflows can be intimidating at first. However, the broad collection of low-overheard compute, storage, and application development services provided by AWS allow for higher performing, more scalable, and lower cost solutions than previously possible. Amazon S3 AWS Snowball* AWS Batch AWS Lambda Amazon EC2 Amazon RDS AWS DMS Amazon CloudWatch AWS CloudTrail AWS Glue Amazon Kinesis Amazon EMR AWS Step Functions … and many more!
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lessons learned By linking Amazon Redshift with RDS PostgreSQL, the combined feature set can power a broader array of use cases and provide the best solution for each task. Amazon Redshift Fast, simple, cost-effective data warehouse that can extend queries to your data lake. Redshift strengths: High performance against large data sets Easily scaled MPP Platform Fast, simple ingestion from Amazon S3 PostgreSQL Amazon RDS PostgreSQL instances provide strong affinity to Redshift due to common PostgreSQL code roots. RDS PostgreSQL strengths: Performance for many small writes Stored Procedure support Additional Postgres 9.x features
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Next steps – Concurrency Scaling for key workloads Redshift Concurrency Scaling is expected to provide consistently fast performance for our analysts, even with thousands of concurrent queries. All of this with a minimal-to-no additional cost. Our platform’s next stage of intelligence and optimization will be derived from AI/ML applied against our data. Query patterns that are more complex and less predictable might increase the chances of concurrency conflicts with our key automated jobs. Further opening the system to internal data science teams means increasing Redshift analyst user base several-fold.
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Next steps – scaling with Amazon Redshift Unload historical data from our largest tables to Amazon S3 Using Unload to Parquet. Transparently query unloaded Amazon S3 data with Redshift-resident data using Redshift Spectrum. Performance excels for infrequently accessed data when Parquet’s columnar format is combined with the Redshift Spectrum Request Accelerator. These three features in concert allow one to seamlessly scale data outside of Amazon Redshift, increasing flexibility of storage and compute provisioning. Specifically, it will allow us to age older data out to S3, while keeping its retrieval seamless and performant.
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift migration benefits Performance & scaling Architecture has scaled over 7x data volume with no effort on our end >20x hardware- normalized performance with large batches Cost 66% reduction in operations overhead more than offsets slight Opex increase Business outcomes >90% reduction in time- to-insight 0 minutes of unscheduled downtime 50% reduction in story cycle time to implement new features
  • 46. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Maor Kleider Maor@amazon.com Jason Rhoades Jason_Rhoades@intuit.com
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.