SlideShare uma empresa Scribd logo
1 de 53
Baixar para ler offline
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
State of the Union: Database & Analytics
Victor Chiu
Senior Business Development Manager, Database & Analytics
What do these companies have in common?
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
*Source: Forbes Online; New Vantage Partners - Big Data Executive Survey
85% of businesses want to be data driven
but only 37% have been successful.
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
 Save time and cost
 Remove undifferentiated heavy lifting
Turn data to insights5
 Better experiences
 Deeper engagement
 Efficient processes
Build
data-driven apps
4
Modernize your
data warehouse
3
 Agility
 Global distribution
 Performance at scale
 Increase scale
 Improve performance
 Lower cost
 Better and faster insights
 Broader access to analytics
How do you build momentum?
010010010
01010001
100010100
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data Flywheel
Data
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data Flywheel
Modernize your data infrastructure
Get the most value from your data
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
Modernizeyour
datainfrastructure
Getthemostvalue
fromyourdata
The Data Flywheel
+
Commercial-grade performance and reliability?
Customers are moving to open databases
Amazon Aurora
MySQL and PostgreSQL compatible relational database built for the cloud
Performance and availability of commercial-grade databases at 1/10th the cost
Performance
and scalability
5x throughput of MySQL
3x throughput of PostgreSQL
Up to15 read replicas
Scale out reads and writes
across multiple data centers
Fully managed
Managed by RDS:
no hardware provisioning,
software patching, setup,
configuration, or backups
Availability
and durability
Fault-tolerant self-healing storage
Six copies of data across three AZs
Continuous backup to S3
Single Global database with cross-
region replication
Network isolation
Encryption at rest/transit
Highly secure
Challenges with integrating ML with your database
Typical steps of incorporating ML into an application
Write application
code to read data
from the database
2
Query and format the
data for the ML
algorithm
3 Call an ML service to
run the algorithm4
Select and train
the model
1 Format the
output
5
Retrieve the
results back to
the application
6
Generate predictions directly from Aurora queries
Models run in SageMaker & Comprehend
Use standard SQL, no ML expertise required
Suitable for low-latency, high-volume use cases
Amazon
SageMaker
ML
Aurora
Database
Athena
Interactive
analytics
SQL
Select
From
Where
ML in Amazon Aurora and Athena
Bringing machine learning to data developers and data analysts
>200,000 databases migrated with DMS
More in 2019 than all of 2016-2018 combined
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data FlywheelModernizeyour
datainfrastructure
Getthemostvalue
fromyourdata
Hardware and software installation
Configuration, patching, and backups
Cluster setup and data replication for high availability
Capacity planning, and scaling clusters for compute and storage
Managing software on-premises
is time consuming and complex
Customers moving to fully managed services
Relational databases
Aurora RDS EMR
Hadoop
and Spark
Elasticsearch
Service
Operational
analytics
Managed
Streaming
for Kafka
Real-time
analytics
DynamoDB DocumentDB ElastiCache
Managed
Cassandra
Service
Non-relational databases
Amazon RDS
Managed relational database service with a choice of popular databases
Easy to administer
No infrastructure provisioning
No software installation and
patching
Built-in monitoring
Performant & scalable
Scale with an API call or a few
clicks
Read replicas for increased
throughput
Automatic Multi-AZ
data replication
Automated backup,
snapshots, and failover
Available & durable Secure and compliant
Encryption at rest and in transit
Network isolation and
resource-level permissions
How do you scale your relational database to support
tens of thousands of connections?
Serverless applications
open and close tens of
thousands of connections
within seconds
Leads to longer query
response times that limits
application scalability
A database proxy server
are difficult to deploy,
patch, and manage
Amazon RDS Proxy
Fully managed, highly available database proxy
Supports new scale of serverless application connections
Pools and shares database connections
Preserve connections during database failovers
Manages DB credentials with Secrets Manager and IAM
Fully managed—No provisioning, patching, management
RDS
Proxy
Applications
RDS
Database Instance
Connection Pooling
PREVIEW
NEW
Amazon RDS on AWS Outposts
RDS
MySQL, PostgreSQL,
AWS
Outposts
Launch RDS in your data centers with AWS Outpost
Integrate with on-premises databases and applications
Deploy secure, managed, RDS in minutes
Store data without moving to cloud
Automates provisioning, patching, backup, restoring,
scaling, and failover
PREVIEW
NEW
Operational Analytics: Amazon Elasticsearch Service
Fully managed, scalable, secure, Elasticsearch service
Open source Elasticsearch
APIs, Kibana, and
Logstash
Open-source Elasticsearch APIs
Managed Kibana
Integration with Logstash
Scale clusters up/down via a
single API call or a few clicks
Secured network isolation
with VPC, encrypt data
at-rest and in-transit
Compliant: HIPPA, PCI DSS,
and ISO
Scalable, secure,
and compliant
Pay only for
what you use
Cost-optimized workloads
No upfront fee or
usage requirement
Critical features built-in:
encryption, VPC support,
24x7 monitoring
Fully managed
Deploy Elasticsearch clusters
in minutes: simplified hardware
provisioning, software
installation/patching, failure
recovery, backups, and monitoring
Challenges with analyzing high volumes of data in real-time
Storing data is
expensive at scale
Limits the amount of
data retained for analysis
Miss out on
valuable insights
UltraWarm for Amazon Elasticsearch Service
A new warm storage tier for Elasticsearch service
Kibana
Dashboard
Amazon Elasticsearch Service domain
Application
Load
Balancer
Seamlessly extends Elasticsearch service
Reduces cost by 90% to store the same amount
of data
Scale up to 3 PB of log data per cluster
Analyze years of operational data
Amazon S3
UltraWarm
Node
UltraWarm
Node
UltraWarm
Node
Active
Master Node
Queries
PREVIEW
NEW
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data FlywheelModernizeyour
datainfrastructure
Getthemostvalue
fromyourdata
Data warehousing: Amazon Redshift
Best performance,
most scalable
3x faster with RA3*
10x faster with AQUA*
Adds unlimited compute capacity
on-demand to meet unlimited
concurrent access
Lowest cost
Cost-optimized workloads
by paying compute and
storage separately
1/10th cost of Traditional
DW at $1000/TB/year
Up to 75% less than other
cloud data warehouses &
predictable costs
Data lake &
AWS integration
Analyze exabytes of data across
data warehouse, data lakes, and
operational database
Query data across various
analytics services
Most secure
& compliant
AWS-grade security (eg. VPC,
encryption with KMS, CloudTrail)
All major certifications such
as SOC, PCI, DSS, ISO,
FedRAMP, HIPPA
First and most popular cloud data warehouse
*vs other cloud DWs
Most widely used Cloud Data Warehouse
Tens of thousands of customers use Redshift & process over 2EB of data per day
Robust result set
caching
Large # of tables support
~20000
Copy command support for ORC,
Parquet
IAM role chaining Elastic resize Groups
Redshift Spectrum: date formats,
scalar json and ION file formats
support, region expansion,
predicate filtering
Auto analyze
Health and performance
monitoring w/Amazon Cloud
watch
Automatic table
distribution style
Cloud watch support for
WLM queues
Performance enhancements—
hash join, vacuum, window
functions, resize ops,
aggregations, console, union all,
efficient compile code cache
Unload
to CSV
Auto WLM
~25 Query Monitoring
Rules (QMR) support
200+
new features in the past 18
months
AQUA
Concurrency Scaling DC1 migration to DC2
Resiliency of ROLLBACK
processing
Manage multi-part
query in AWS console
Auto analyze for
incremental changes on
table
Spectrum Request
Accelerator
Apply new distribution key
Redshift Spectrum: Row
group filtering in Parquet
and ORC, Nested data
support, Enhanced VPC
Routing, Multiple partitions
Faster Classic resize
with optimized data
transfer protocol
Performance: Bloom filters in
joins, complex queries that
create internal table,
communication layer
Redshift Spectrum:
Concurrency scaling
Amazon Lake Formation
integration
Auto-Vacuum sort,
Auto-Analyze and Auto
Table Sort
Auto WLM with query
priorities
Snapshot scheduler
Performance: join pushdowns
to subquery,, mixed workloads
temporary tables, rank functions,
null handling in join, single row
insert
Advisor recommendations
for distribution keys
AZ64 compression
encoding
Console redesign
Stored procedures
Spatial Processing
Column level access
control
with AWS lake formation
RA3
Performance of Inter-
Region Snapshot
Transfers
Federated
Query
Materialized
Views
Manual Pause and Resume
Amazon Redshift has been innovating quickly
Amazon Redshift Materialized Views
Defined by a SQL query, precomputed results, incrementally
refreshed
Orders-of-magnitude query acceleration
Recommended for predictable and repeated queries used in
dashboarding and interactive analysis
C1 C2 C3
R1
R2
R3
C1 C2 C3 C4
R1
R2
R3
C1
R1
R2
R9
C1 C2 C3
R1
R2
R3
C1
R1
R7
R9
Materialized Views
PREVIEW
NEW
Amazon Redshift Data Lake Export
Export data directly to Amazon S3 in Apache Parquet
Save results of data transformation into S3 data lake
Export with the UNLOAD command and specify Parquet
Redshift formats, partitions, and moves data into S3
Analyze with Amazon SageMaker, Athena, and EMR
S3
Redshift
NEW
Amazon Redshift Federated Query
Analyze data across data warehouse, data lakes, and operational database
Query across multiple systems from Redshift
Combine data warehouse and transactional data
Compatible with Amazon RDS and Aurora (PostgreSQL)
SQL
A M A Z O N
R D S
A M A Z O N
A U R O R A
A M A Z O N
R E D S H I F T
S 3 D A T A L A K E
PREVIEW
NEW
Amazon Redshift on RA3 instances
Optimize your data warehouse by paying for compute and storage separately
Delivers 3x the performance of existing cloud DWs
Automatically scales your DW storage capacity
DS2 customers can migrate and get 2x performance
and 2x storage for the same cost
Supports workloads up to 8 PB (compressed) for a
cluster
COMPUTE NODE
(RA3)
SSD Cache
S 3 S T O R A G E
COMPUTE NODE
(RA3)
SSD Cache
COMPUTE NODE
(RA3)
SSD Cache
COMPUTE NODE
(RA3)
SSD Cache
Managed storage
$/node/hour
$/TB/month
GA
NEW
AQUA
(Advanced Query Accelerator)
for Amazon Redshift
An innovative new hardware-accelerated cache that delivers up
to 10x better query performance than other cloud data
warehouses
NVMe SSDs
CUSTOM ANALYTICS PROCESSORS
AWS NITRO SYSTEM
COMING IN
2020
NEW
AQUA – Advanced Query Accelerator
Redshift runs 10x faster than any other cloud data warehouse without increasing cost
AQUA brings compute to the storage layer so data
doesn’t have to move back and forth
High-speed cache on top of S3 scales out to process
data
in parallel across many nodes
AWS custom-designed analytics processors accelerate
data compression, encryption, and data processing
100% compatible with the current version of RedshiftS 3 S T O R A G E
AQUA
ADVANCED QUERY ACCELERATOR
R A 3 C O M P U T E C L U S T E R
COMING IN
2020
NEW
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data FlywheelModernizeyour
datainfrastructure
Getthemostvalue
fromyourdata
Characteristics of modern applications
Internet-scale and transactional
Users: 1M+
Data volume: TB–PB–EB
Locality: Global
Performance: Milliseconds–microseconds
Request Rate: Millions
Access: Web, Mobile, IoT, devices
Scale: Up-down, Out-in
Economics: Pay for what you use
Developer access: Instant API accessSocial mediaRide hailing Media streaming Dating
Break complex apps into smaller pieces and pick the
best tool to solve each problem
This ensures that the apps are well architected and
scale effectively
Developers are now building highly distributed apps using
purpose-built databases and micro-services architecture
Developers are doing what they do best
Common data categories and use cases
Amazon Managed (Apache) Cassandra Service
Scalable, highly available, and managed Cassandra-compatible database service
No need to provision, configure,
and operate large Cassandra
clusters or add and remove
nodes manually
No servers to manage
Single-digit millisecond
performance
Scale tables up and down
automatically based on
application traffic
Virtually unlimited
throughput and storage
Single-digit millisecond
performance at scale
Apache
Cassandra-compatible
Use the same application code,
licensed drivers, and tools
built on Cassandra
Simple migration
Simple migration to Managed
Cassandra Service for
Cassandra databases on
premises or on EC2
PREVIEW
NEW
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data FlywheelModernizeyour
datainfrastructure
Getthemostvalue
fromyourdata
Customers moving to data lake architectures
Bringing together the best of both worlds
Extends or evolves DW architectures
Store any data in any format
Durable, available, and exabyte scale
Secure, compliant, auditable
Run any type of analytics from DW to Predictive
Data
Warehousing
Analytics Machine
Learning
Data lake
Any type of analytics on the data lake
Most comprehensive analytics platform
Amazon S3 | AWS Glue
Lake Formation
Data lake
Amazon
Redshift
Amazon
EMR
Amazon
Athena
Amazon
Elasticsearch
Service
Amazon
Kinesis
Amazon
MSK
Amazon
SageMaker
Amazon
Personalize
Amazon
QuickSight
AWS Data
Exchange
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Predictive
Analytics
RecommendationsVisualizations
Data
Exchange
Amazon EMR
Easily Run Spark, Hadoop, Hive, Presto, HBase, and more big data apps on AWS
Low cost
50–80% reduction in costs with
EC2 Spot and Reserved Instances
Per-second billing for flexibility
Use S3 storage
Process data in S3
securely with high performance
using the EMRFS connector
Latest versions
Updated with latest open source
frameworks within 30 days
Fully managed no cluster
setup, node provisioning,
cluster tuning
Easy
Performance Improvements in Spark for Amazon EMR
Performance optimized runtime for Apache Spark, 2.6x faster performance at 1/10th the cost
*Based on TPC-DS 3 TB Benchmarking running 6 node
C4x8 extra large clusters and EMR 5.28, Spark 2.4
10,164
16,478
26,478
0 5,000 10,000 15,000 20,000 25,000 30,000
Spark with EMR (with runtime)
3rd party Managed Spark (with their
runtime)
Spark with EMR (without runtime)
Runtime total on 104 queries
(seconds—lower is better)
Runtime optimized for Apache Spark performance
100% compliant with Apache Spark APIs
Best performance
2.6x faster than Spark with EMR without runtime
1.6x faster than 3rd party Managed Spark (with their runtime)
Lowest price
1/10th the cost of 3rd party Managed Spark (with their runtime)
NEW
Amazon EMR on AWS Outposts
Launch EMR in your data centers with AWS Outpost
Integrate with existing on-premises Hadoop deployments
Deploy secure, managed, EMR clusters in minutes
Process and analyze data on-premises on AWS Outpost
EMR
Hadoop + Spark
AWS
Outposts
On-premises
Hadoop/Spark
GA
NEW
Amazon Athena
Pay per query
Pay only for queries run
Save 30–90% on per-query costs
through compression
Use S3 storage
ANSI SQL
JDBC/ODBC drivers
Multiple formats,
compression types, and
complex joins and data types
SQL
Serverless: zero infrastructure,
zero administration
Integrated with QuickSight
EasyQuery instantly
Zero setup cost
Point to S3 and start querying
Serverless, interactive query service
Amazon Athena Federated Query
Run SQL queries on data spanning multiple data stores
Redshift
Data warehousing
ElastiCache
Redis
Aurora
MySQL, PostgreSQL
DynamoDB
Key value, Document
DocumentDB
Document
S3/Glacier
Run connectors in AWS Lambda: no servers to manage
Run SQL queries on relational, non-relational, object,
or custom data sources; in the cloud or on-premises
Open Source Connectors for common data sources
Build connectors to custom data sources
PREVIEW
NEW
Amazon QuickSight
First BI service built for the cloud with pay-per-session pricing & ML insights for everyone
Elastic Scaling
Auto-scale 10 to 10K+
users in minutes
Pay-as-you-go
Serverless
Create dashboards in
minutes
Deploy globally
without provisioning a
single server
Deeply integrated
with AWS services
Secure, Private access to
AWS data
Integrated S3 data lake
permissions through AWS IAM
API Support
Programmatically onboard users
and manage content
Easily embed in your apps
NEW
ML predictions in Amazon QuickSight (preview)
AWS/On-premise data sources
• Excel
• CSV
• MySQL
• PostgreSQL
• Maria DB
• Presto
• Spark
• SQL Server
• Amazon
Redshift
• RDS
• S3
• Athena
• Aurora
• EMR
• Snowflake
• Teradata
• Salesforce
• Square
• Adobe
Analytics
• Jira
• ServiceNow
• Twitter
• GitHub
1 Connect to any data:
Data lakes, SQL engines, 3rd
party applications and on-
premises databases
2 Select an ML model:
Create models with Amazon
SageMaker AutoPilot, existing
custom models and packaged
models from AWS Marketplace.
Custom
Models
QuickSight
Amazon
SageMaker
AutoPilot
Models
AWS
Marketplace
3 Visualize and share:
Analyze results, create
visualizations, build dashboards
/ email reports and share to
business stakeholders
NEW
Data exchange: AWS Data Exchange
Easily find and subscribe to 3rd-party data in the cloud
Efficiently access
3rd party data
Simplifies access to data: No
need to receive physical media,
manage FTP credentials, or
integrate with different APIs
Minimize legal reviews and
negotiations
Quickly find diverse
data in one place
>1,000 data products
>80 data providers including
include Dow Jones, Change
Healthcare, Foursquare, Dun
& Bradstreet, Thomson
Reuters, Pitney Bowes, Lexis
Nexis, and Deloitte
Easily analyze data
Download or copy data to S3
Combine, analyze, and model
with existing data
Analyze data with EMR,
Redshift, Athena, and AWS
Glue
GA
NEW
Our portfolio
Broad and deep portfolio, purpose-built for builders
S3/Glacier
Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka
Data Movement
Data Lake
Business Intelligence & Machine Learning
Data Exchange
Data exchange
NEW
QuickSight
Visualizations
SageMaker
ML
Comprehend
NLP
Transcribe
Speech-to-text
Textract
Extract text
Personalize
Recommendation
Forecast
Forecasts
Translate
Translation
CodeGuru
Code reviews
Kendra
Enterprise search
NEW NEW
RDS
MySQL, PostgreSQL,
MariaDB, Oracle, SQL Server,
RDS on VMware
Aurora
MySQL, PostgreSQL
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
Analytics Databases
Managed
Blockchain
Blockchain
Templates
Blockchain
Managed Apache
Cassandra Service
Wide column
NEW
DocumentDB
Document
Redshift
Data warehousing
EMR
Hadoop + Spark
Kinesis Data Analytics
Real time
Elasticsearch Service
Operational Analytics
Athena
Interactive analytics
NEW
NEW
NEW
NEW
NEW
AQUA EMR on Outposts
UltraWarm
RDS Proxy
RDS on Outposts
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Victor Chiu
Senior Business Development Manager, Database & Analytics
Data silos to
OLTP ERP CRM LOB
DW Silo 1
Business
Intelligence
Devices Web Sensors Social
DW Silo 2
Business
Intelligence Machine
learning
BI +
analytics
Data
warehousing
Data lakes
Open formats
Central catalog
Traditional data warehousing approaches don’t scale
It’s challenging to manage large Cassandra clusters
Specialized expertise to setup, configure, and maintain infrastructure and software
Scaling clusters is time-consuming, manual, and prone to over-provisioning
Manual backups and error-prone restore process to maintain integrity
Unreliable upgrades with clunky rollback and debugging capabilities
State of the Union: Database & Analytics

Mais conteúdo relacionado

Mais procurados

Track 6 Session 2_ 搭建現代化的資料數據湖.pptx
Track 6 Session 2_ 搭建現代化的資料數據湖.pptxTrack 6 Session 2_ 搭建現代化的資料數據湖.pptx
Track 6 Session 2_ 搭建現代化的資料數據湖.pptx
Amazon Web Services
 
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptxTrack 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Amazon Web Services
 
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Amazon Web Services
 
Track 1 Session 3_建構安全高效的電子設計自動化環境
Track 1 Session 3_建構安全高效的電子設計自動化環境Track 1 Session 3_建構安全高效的電子設計自動化環境
Track 1 Session 3_建構安全高效的電子設計自動化環境
Amazon Web Services
 
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptxTrack 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
Amazon Web Services
 
Track 3 Session 2_從傳統 legacy 邁向數位化與現代化架構
Track 3 Session 2_從傳統  legacy  邁向數位化與現代化架構Track 3 Session 2_從傳統  legacy  邁向數位化與現代化架構
Track 3 Session 2_從傳統 legacy 邁向數位化與現代化架構
Amazon Web Services
 

Mais procurados (20)

Track 6 Session 2_ 搭建現代化的資料數據湖.pptx
Track 6 Session 2_ 搭建現代化的資料數據湖.pptxTrack 6 Session 2_ 搭建現代化的資料數據湖.pptx
Track 6 Session 2_ 搭建現代化的資料數據湖.pptx
 
Analyzing your web and application logs with the Amazon Elasticsearch Service...
Analyzing your web and application logs with the Amazon Elasticsearch Service...Analyzing your web and application logs with the Amazon Elasticsearch Service...
Analyzing your web and application logs with the Amazon Elasticsearch Service...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptxTrack 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
 
Innovating SAP the Easy Way – Migrate it to AWS
Innovating SAP the Easy Way – Migrate it to AWSInnovating SAP the Easy Way – Migrate it to AWS
Innovating SAP the Easy Way – Migrate it to AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
The Future of Enterprise IT
The Future of Enterprise IT The Future of Enterprise IT
The Future of Enterprise IT
 
Digital Transformation
Digital TransformationDigital Transformation
Digital Transformation
 
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
 
Track 1 Session 3_建構安全高效的電子設計自動化環境
Track 1 Session 3_建構安全高效的電子設計自動化環境Track 1 Session 3_建構安全高效的電子設計自動化環境
Track 1 Session 3_建構安全高效的電子設計自動化環境
 
Avere & AWS Enterprise Solution with Special Bundle Pricing Offer
Avere & AWS Enterprise Solution with Special Bundle Pricing OfferAvere & AWS Enterprise Solution with Special Bundle Pricing Offer
Avere & AWS Enterprise Solution with Special Bundle Pricing Offer
 
ENT207-The Future of Enterprise IT.pdf
ENT207-The Future of Enterprise IT.pdfENT207-The Future of Enterprise IT.pdf
ENT207-The Future of Enterprise IT.pdf
 
Building for the Public Sector
Building for the Public SectorBuilding for the Public Sector
Building for the Public Sector
 
Migrate & Optimize Microsoft Applications on AWS
Migrate & Optimize Microsoft Applications on AWSMigrate & Optimize Microsoft Applications on AWS
Migrate & Optimize Microsoft Applications on AWS
 
Managed Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDSManaged Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDS
 
Amazon QuickSight First Call Deck
Amazon QuickSight First Call DeckAmazon QuickSight First Call Deck
Amazon QuickSight First Call Deck
 
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptxTrack 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
 
Track 3 Session 2_從傳統 legacy 邁向數位化與現代化架構
Track 3 Session 2_從傳統  legacy  邁向數位化與現代化架構Track 3 Session 2_從傳統  legacy  邁向數位化與現代化架構
Track 3 Session 2_從傳統 legacy 邁向數位化與現代化架構
 
Enterprise Cloud Computing with AWS - How enterprises are using the AWS Cloud...
Enterprise Cloud Computing with AWS - How enterprises are using the AWS Cloud...Enterprise Cloud Computing with AWS - How enterprises are using the AWS Cloud...
Enterprise Cloud Computing with AWS - How enterprises are using the AWS Cloud...
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 

Semelhante a State of the Union: Database & Analytics

Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Amazon Web Services
 

Semelhante a State of the Union: Database & Analytics (20)

AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif AbbasiAWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
 
Big Data Solutions Day - Calgary
Big Data Solutions Day - CalgaryBig Data Solutions Day - Calgary
Big Data Solutions Day - Calgary
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Keynote sp summit 2014 final
Keynote sp summit 2014  finalKeynote sp summit 2014  final
Keynote sp summit 2014 final
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
What’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesWhat’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial Databases
 
Delivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data servicesDelivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data services
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Auckland Summit Keynote
Auckland Summit KeynoteAuckland Summit Keynote
Auckland Summit Keynote
 
What’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesWhat’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial Databases
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan Kumar
 
AWS Summit Atlanta Keynote
AWS Summit Atlanta KeynoteAWS Summit Atlanta Keynote
AWS Summit Atlanta Keynote
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam Elmalak
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
 

Mais de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei server
 

State of the Union: Database & Analytics

  • 1. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. State of the Union: Database & Analytics Victor Chiu Senior Business Development Manager, Database & Analytics
  • 2. What do these companies have in common?
  • 3. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. *Source: Forbes Online; New Vantage Partners - Big Data Executive Survey 85% of businesses want to be data driven but only 37% have been successful.
  • 4. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2  Save time and cost  Remove undifferentiated heavy lifting Turn data to insights5  Better experiences  Deeper engagement  Efficient processes Build data-driven apps 4 Modernize your data warehouse 3  Agility  Global distribution  Performance at scale  Increase scale  Improve performance  Lower cost  Better and faster insights  Broader access to analytics How do you build momentum?
  • 5. 010010010 01010001 100010100 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 The Data Flywheel Data
  • 6. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 The Data Flywheel Modernize your data infrastructure Get the most value from your data
  • 7. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 Modernizeyour datainfrastructure Getthemostvalue fromyourdata The Data Flywheel
  • 8. + Commercial-grade performance and reliability? Customers are moving to open databases
  • 9. Amazon Aurora MySQL and PostgreSQL compatible relational database built for the cloud Performance and availability of commercial-grade databases at 1/10th the cost Performance and scalability 5x throughput of MySQL 3x throughput of PostgreSQL Up to15 read replicas Scale out reads and writes across multiple data centers Fully managed Managed by RDS: no hardware provisioning, software patching, setup, configuration, or backups Availability and durability Fault-tolerant self-healing storage Six copies of data across three AZs Continuous backup to S3 Single Global database with cross- region replication Network isolation Encryption at rest/transit Highly secure
  • 10. Challenges with integrating ML with your database Typical steps of incorporating ML into an application Write application code to read data from the database 2 Query and format the data for the ML algorithm 3 Call an ML service to run the algorithm4 Select and train the model 1 Format the output 5 Retrieve the results back to the application 6
  • 11. Generate predictions directly from Aurora queries Models run in SageMaker & Comprehend Use standard SQL, no ML expertise required Suitable for low-latency, high-volume use cases Amazon SageMaker ML Aurora Database Athena Interactive analytics SQL Select From Where ML in Amazon Aurora and Athena Bringing machine learning to data developers and data analysts
  • 12. >200,000 databases migrated with DMS More in 2019 than all of 2016-2018 combined
  • 13. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 The Data FlywheelModernizeyour datainfrastructure Getthemostvalue fromyourdata
  • 14. Hardware and software installation Configuration, patching, and backups Cluster setup and data replication for high availability Capacity planning, and scaling clusters for compute and storage Managing software on-premises is time consuming and complex
  • 15. Customers moving to fully managed services Relational databases Aurora RDS EMR Hadoop and Spark Elasticsearch Service Operational analytics Managed Streaming for Kafka Real-time analytics DynamoDB DocumentDB ElastiCache Managed Cassandra Service Non-relational databases
  • 16. Amazon RDS Managed relational database service with a choice of popular databases Easy to administer No infrastructure provisioning No software installation and patching Built-in monitoring Performant & scalable Scale with an API call or a few clicks Read replicas for increased throughput Automatic Multi-AZ data replication Automated backup, snapshots, and failover Available & durable Secure and compliant Encryption at rest and in transit Network isolation and resource-level permissions
  • 17. How do you scale your relational database to support tens of thousands of connections? Serverless applications open and close tens of thousands of connections within seconds Leads to longer query response times that limits application scalability A database proxy server are difficult to deploy, patch, and manage
  • 18. Amazon RDS Proxy Fully managed, highly available database proxy Supports new scale of serverless application connections Pools and shares database connections Preserve connections during database failovers Manages DB credentials with Secrets Manager and IAM Fully managed—No provisioning, patching, management RDS Proxy Applications RDS Database Instance Connection Pooling PREVIEW NEW
  • 19. Amazon RDS on AWS Outposts RDS MySQL, PostgreSQL, AWS Outposts Launch RDS in your data centers with AWS Outpost Integrate with on-premises databases and applications Deploy secure, managed, RDS in minutes Store data without moving to cloud Automates provisioning, patching, backup, restoring, scaling, and failover PREVIEW NEW
  • 20. Operational Analytics: Amazon Elasticsearch Service Fully managed, scalable, secure, Elasticsearch service Open source Elasticsearch APIs, Kibana, and Logstash Open-source Elasticsearch APIs Managed Kibana Integration with Logstash Scale clusters up/down via a single API call or a few clicks Secured network isolation with VPC, encrypt data at-rest and in-transit Compliant: HIPPA, PCI DSS, and ISO Scalable, secure, and compliant Pay only for what you use Cost-optimized workloads No upfront fee or usage requirement Critical features built-in: encryption, VPC support, 24x7 monitoring Fully managed Deploy Elasticsearch clusters in minutes: simplified hardware provisioning, software installation/patching, failure recovery, backups, and monitoring
  • 21. Challenges with analyzing high volumes of data in real-time Storing data is expensive at scale Limits the amount of data retained for analysis Miss out on valuable insights
  • 22. UltraWarm for Amazon Elasticsearch Service A new warm storage tier for Elasticsearch service Kibana Dashboard Amazon Elasticsearch Service domain Application Load Balancer Seamlessly extends Elasticsearch service Reduces cost by 90% to store the same amount of data Scale up to 3 PB of log data per cluster Analyze years of operational data Amazon S3 UltraWarm Node UltraWarm Node UltraWarm Node Active Master Node Queries PREVIEW NEW
  • 23. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 The Data FlywheelModernizeyour datainfrastructure Getthemostvalue fromyourdata
  • 24. Data warehousing: Amazon Redshift Best performance, most scalable 3x faster with RA3* 10x faster with AQUA* Adds unlimited compute capacity on-demand to meet unlimited concurrent access Lowest cost Cost-optimized workloads by paying compute and storage separately 1/10th cost of Traditional DW at $1000/TB/year Up to 75% less than other cloud data warehouses & predictable costs Data lake & AWS integration Analyze exabytes of data across data warehouse, data lakes, and operational database Query data across various analytics services Most secure & compliant AWS-grade security (eg. VPC, encryption with KMS, CloudTrail) All major certifications such as SOC, PCI, DSS, ISO, FedRAMP, HIPPA First and most popular cloud data warehouse *vs other cloud DWs
  • 25. Most widely used Cloud Data Warehouse Tens of thousands of customers use Redshift & process over 2EB of data per day
  • 26. Robust result set caching Large # of tables support ~20000 Copy command support for ORC, Parquet IAM role chaining Elastic resize Groups Redshift Spectrum: date formats, scalar json and ION file formats support, region expansion, predicate filtering Auto analyze Health and performance monitoring w/Amazon Cloud watch Automatic table distribution style Cloud watch support for WLM queues Performance enhancements— hash join, vacuum, window functions, resize ops, aggregations, console, union all, efficient compile code cache Unload to CSV Auto WLM ~25 Query Monitoring Rules (QMR) support 200+ new features in the past 18 months AQUA Concurrency Scaling DC1 migration to DC2 Resiliency of ROLLBACK processing Manage multi-part query in AWS console Auto analyze for incremental changes on table Spectrum Request Accelerator Apply new distribution key Redshift Spectrum: Row group filtering in Parquet and ORC, Nested data support, Enhanced VPC Routing, Multiple partitions Faster Classic resize with optimized data transfer protocol Performance: Bloom filters in joins, complex queries that create internal table, communication layer Redshift Spectrum: Concurrency scaling Amazon Lake Formation integration Auto-Vacuum sort, Auto-Analyze and Auto Table Sort Auto WLM with query priorities Snapshot scheduler Performance: join pushdowns to subquery,, mixed workloads temporary tables, rank functions, null handling in join, single row insert Advisor recommendations for distribution keys AZ64 compression encoding Console redesign Stored procedures Spatial Processing Column level access control with AWS lake formation RA3 Performance of Inter- Region Snapshot Transfers Federated Query Materialized Views Manual Pause and Resume Amazon Redshift has been innovating quickly
  • 27. Amazon Redshift Materialized Views Defined by a SQL query, precomputed results, incrementally refreshed Orders-of-magnitude query acceleration Recommended for predictable and repeated queries used in dashboarding and interactive analysis C1 C2 C3 R1 R2 R3 C1 C2 C3 C4 R1 R2 R3 C1 R1 R2 R9 C1 C2 C3 R1 R2 R3 C1 R1 R7 R9 Materialized Views PREVIEW NEW
  • 28. Amazon Redshift Data Lake Export Export data directly to Amazon S3 in Apache Parquet Save results of data transformation into S3 data lake Export with the UNLOAD command and specify Parquet Redshift formats, partitions, and moves data into S3 Analyze with Amazon SageMaker, Athena, and EMR S3 Redshift NEW
  • 29. Amazon Redshift Federated Query Analyze data across data warehouse, data lakes, and operational database Query across multiple systems from Redshift Combine data warehouse and transactional data Compatible with Amazon RDS and Aurora (PostgreSQL) SQL A M A Z O N R D S A M A Z O N A U R O R A A M A Z O N R E D S H I F T S 3 D A T A L A K E PREVIEW NEW
  • 30. Amazon Redshift on RA3 instances Optimize your data warehouse by paying for compute and storage separately Delivers 3x the performance of existing cloud DWs Automatically scales your DW storage capacity DS2 customers can migrate and get 2x performance and 2x storage for the same cost Supports workloads up to 8 PB (compressed) for a cluster COMPUTE NODE (RA3) SSD Cache S 3 S T O R A G E COMPUTE NODE (RA3) SSD Cache COMPUTE NODE (RA3) SSD Cache COMPUTE NODE (RA3) SSD Cache Managed storage $/node/hour $/TB/month GA NEW
  • 31. AQUA (Advanced Query Accelerator) for Amazon Redshift An innovative new hardware-accelerated cache that delivers up to 10x better query performance than other cloud data warehouses NVMe SSDs CUSTOM ANALYTICS PROCESSORS AWS NITRO SYSTEM COMING IN 2020 NEW
  • 32. AQUA – Advanced Query Accelerator Redshift runs 10x faster than any other cloud data warehouse without increasing cost AQUA brings compute to the storage layer so data doesn’t have to move back and forth High-speed cache on top of S3 scales out to process data in parallel across many nodes AWS custom-designed analytics processors accelerate data compression, encryption, and data processing 100% compatible with the current version of RedshiftS 3 S T O R A G E AQUA ADVANCED QUERY ACCELERATOR R A 3 C O M P U T E C L U S T E R COMING IN 2020 NEW
  • 33. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 The Data FlywheelModernizeyour datainfrastructure Getthemostvalue fromyourdata
  • 34. Characteristics of modern applications Internet-scale and transactional Users: 1M+ Data volume: TB–PB–EB Locality: Global Performance: Milliseconds–microseconds Request Rate: Millions Access: Web, Mobile, IoT, devices Scale: Up-down, Out-in Economics: Pay for what you use Developer access: Instant API accessSocial mediaRide hailing Media streaming Dating
  • 35. Break complex apps into smaller pieces and pick the best tool to solve each problem This ensures that the apps are well architected and scale effectively Developers are now building highly distributed apps using purpose-built databases and micro-services architecture Developers are doing what they do best
  • 36. Common data categories and use cases
  • 37. Amazon Managed (Apache) Cassandra Service Scalable, highly available, and managed Cassandra-compatible database service No need to provision, configure, and operate large Cassandra clusters or add and remove nodes manually No servers to manage Single-digit millisecond performance Scale tables up and down automatically based on application traffic Virtually unlimited throughput and storage Single-digit millisecond performance at scale Apache Cassandra-compatible Use the same application code, licensed drivers, and tools built on Cassandra Simple migration Simple migration to Managed Cassandra Service for Cassandra databases on premises or on EC2 PREVIEW NEW
  • 38. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 The Data FlywheelModernizeyour datainfrastructure Getthemostvalue fromyourdata
  • 39. Customers moving to data lake architectures Bringing together the best of both worlds Extends or evolves DW architectures Store any data in any format Durable, available, and exabyte scale Secure, compliant, auditable Run any type of analytics from DW to Predictive Data Warehousing Analytics Machine Learning Data lake
  • 40. Any type of analytics on the data lake Most comprehensive analytics platform Amazon S3 | AWS Glue Lake Formation Data lake Amazon Redshift Amazon EMR Amazon Athena Amazon Elasticsearch Service Amazon Kinesis Amazon MSK Amazon SageMaker Amazon Personalize Amazon QuickSight AWS Data Exchange Data Warehousing Big Data Processing Interactive Query Operational Analytics Real time Analytics Predictive Analytics RecommendationsVisualizations Data Exchange
  • 41. Amazon EMR Easily Run Spark, Hadoop, Hive, Presto, HBase, and more big data apps on AWS Low cost 50–80% reduction in costs with EC2 Spot and Reserved Instances Per-second billing for flexibility Use S3 storage Process data in S3 securely with high performance using the EMRFS connector Latest versions Updated with latest open source frameworks within 30 days Fully managed no cluster setup, node provisioning, cluster tuning Easy
  • 42. Performance Improvements in Spark for Amazon EMR Performance optimized runtime for Apache Spark, 2.6x faster performance at 1/10th the cost *Based on TPC-DS 3 TB Benchmarking running 6 node C4x8 extra large clusters and EMR 5.28, Spark 2.4 10,164 16,478 26,478 0 5,000 10,000 15,000 20,000 25,000 30,000 Spark with EMR (with runtime) 3rd party Managed Spark (with their runtime) Spark with EMR (without runtime) Runtime total on 104 queries (seconds—lower is better) Runtime optimized for Apache Spark performance 100% compliant with Apache Spark APIs Best performance 2.6x faster than Spark with EMR without runtime 1.6x faster than 3rd party Managed Spark (with their runtime) Lowest price 1/10th the cost of 3rd party Managed Spark (with their runtime) NEW
  • 43. Amazon EMR on AWS Outposts Launch EMR in your data centers with AWS Outpost Integrate with existing on-premises Hadoop deployments Deploy secure, managed, EMR clusters in minutes Process and analyze data on-premises on AWS Outpost EMR Hadoop + Spark AWS Outposts On-premises Hadoop/Spark GA NEW
  • 44. Amazon Athena Pay per query Pay only for queries run Save 30–90% on per-query costs through compression Use S3 storage ANSI SQL JDBC/ODBC drivers Multiple formats, compression types, and complex joins and data types SQL Serverless: zero infrastructure, zero administration Integrated with QuickSight EasyQuery instantly Zero setup cost Point to S3 and start querying Serverless, interactive query service
  • 45. Amazon Athena Federated Query Run SQL queries on data spanning multiple data stores Redshift Data warehousing ElastiCache Redis Aurora MySQL, PostgreSQL DynamoDB Key value, Document DocumentDB Document S3/Glacier Run connectors in AWS Lambda: no servers to manage Run SQL queries on relational, non-relational, object, or custom data sources; in the cloud or on-premises Open Source Connectors for common data sources Build connectors to custom data sources PREVIEW NEW
  • 46. Amazon QuickSight First BI service built for the cloud with pay-per-session pricing & ML insights for everyone Elastic Scaling Auto-scale 10 to 10K+ users in minutes Pay-as-you-go Serverless Create dashboards in minutes Deploy globally without provisioning a single server Deeply integrated with AWS services Secure, Private access to AWS data Integrated S3 data lake permissions through AWS IAM API Support Programmatically onboard users and manage content Easily embed in your apps NEW
  • 47. ML predictions in Amazon QuickSight (preview) AWS/On-premise data sources • Excel • CSV • MySQL • PostgreSQL • Maria DB • Presto • Spark • SQL Server • Amazon Redshift • RDS • S3 • Athena • Aurora • EMR • Snowflake • Teradata • Salesforce • Square • Adobe Analytics • Jira • ServiceNow • Twitter • GitHub 1 Connect to any data: Data lakes, SQL engines, 3rd party applications and on- premises databases 2 Select an ML model: Create models with Amazon SageMaker AutoPilot, existing custom models and packaged models from AWS Marketplace. Custom Models QuickSight Amazon SageMaker AutoPilot Models AWS Marketplace 3 Visualize and share: Analyze results, create visualizations, build dashboards / email reports and share to business stakeholders NEW
  • 48. Data exchange: AWS Data Exchange Easily find and subscribe to 3rd-party data in the cloud Efficiently access 3rd party data Simplifies access to data: No need to receive physical media, manage FTP credentials, or integrate with different APIs Minimize legal reviews and negotiations Quickly find diverse data in one place >1,000 data products >80 data providers including include Dow Jones, Change Healthcare, Foursquare, Dun & Bradstreet, Thomson Reuters, Pitney Bowes, Lexis Nexis, and Deloitte Easily analyze data Download or copy data to S3 Combine, analyze, and model with existing data Analyze data with EMR, Redshift, Athena, and AWS Glue GA NEW
  • 49. Our portfolio Broad and deep portfolio, purpose-built for builders S3/Glacier Glue ETL & Data Catalog Lake Formation Data Lakes Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka Data Movement Data Lake Business Intelligence & Machine Learning Data Exchange Data exchange NEW QuickSight Visualizations SageMaker ML Comprehend NLP Transcribe Speech-to-text Textract Extract text Personalize Recommendation Forecast Forecasts Translate Translation CodeGuru Code reviews Kendra Enterprise search NEW NEW RDS MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, RDS on VMware Aurora MySQL, PostgreSQL DynamoDB Key value, Document ElastiCache Redis, Memcached Neptune Graph Timestream Time Series QLDB Ledger Database Analytics Databases Managed Blockchain Blockchain Templates Blockchain Managed Apache Cassandra Service Wide column NEW DocumentDB Document Redshift Data warehousing EMR Hadoop + Spark Kinesis Data Analytics Real time Elasticsearch Service Operational Analytics Athena Interactive analytics NEW NEW NEW NEW NEW AQUA EMR on Outposts UltraWarm RDS Proxy RDS on Outposts
  • 50. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Victor Chiu Senior Business Development Manager, Database & Analytics
  • 51. Data silos to OLTP ERP CRM LOB DW Silo 1 Business Intelligence Devices Web Sensors Social DW Silo 2 Business Intelligence Machine learning BI + analytics Data warehousing Data lakes Open formats Central catalog Traditional data warehousing approaches don’t scale
  • 52. It’s challenging to manage large Cassandra clusters Specialized expertise to setup, configure, and maintain infrastructure and software Scaling clusters is time-consuming, manual, and prone to over-provisioning Manual backups and error-prone restore process to maintain integrity Unreliable upgrades with clunky rollback and debugging capabilities