SlideShare uma empresa Scribd logo
1 de 45
Best Practices for Migrating your
Data Warehouse to Amazon
Redshift
Vikram Gangulavoipalyam, Enterprise Solutions Architect,
June 13th 2017
Migrating your Data Warehouse Overview
• Why Migrate
• Customer Success Stories
• Amazon Redshift History and Development
• Cluster Architecture
• Migration Best Practices
• Migration Tools
• Open Q&A
Are you a user ?
Why Migrate
 100x faster
 Scales from GBs to PBs
 Analyze data without storage
constraints
 10x cheaper
 Easy to provision and operate
 Higher productivity
 10x faster
 No programming
 Standard interfaces and
integration to leverage BI tools,
machine learning, streaming
Transactional database MPP database Hadoop
Why Migrate to Amazon Redshift?
Selected Amazon Redshift Customers
Migration from Oracle @ Boingo Wireless
2000+ Commercial Wi-Fi locations
1 million+ Hotspots
90M+ ad engagements
100+ countries
Legacy DW: Oracle 11g based DW
Before migration
Rapid data growth slowed
analytics
Mediocre IOPS, limited memory,
vertical scaling
Admin overhead
Expensive (license, h/w, support)
After migration
180x performance improvement
7x cost savings
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
Exadata SAP
HANA
Redshift
$400,000
$300,000
$55,000
7,200
2,700
15 15
Query
Performance
Data Load
Performance
1 year of data
1 million records
Latencyinseconds
RedshiftExisting System
7X cheaper than Oracle Exadata 180X faster than Oracle database
Migration from Oracle @ Boingo Wireless
Migration from Greenplum @ NTT Docomo
68 million customers
10s of TBs per day of data across
mobile network
6PB of total data (uncompressed)
Data science for marketing
operations, logistics etc.
Legacy DW: Greenplum on-premises
After migration:
125 node DS2.8XL cluster
4,500 vCPUs, 30TB RAM
6 PB uncompressed
10x faster analytic queries
50% reduction in time for new BI
app. deployment
Significantly less ops. overhead
Migration from SQL on Hadoop @ Yahoo
 Analytics for website/mobile events
across multiple Yahoo properties
 On an average day
2B events
25M devices
Before migration: Hive – Found it to be
slow, hard to use, share and repeat
After migration:
21 node DC1.8XL (SSD)
50TB compressed data
100x performance improvement
Real-time insights
Easier deployment and
maintenance
Migration from SQL on Hadoop @ Yahoo
1
10
100
1000
10000
Count
Distinct
Devices
Count All
Events
Filter
Clauses
Joins
Seconds
Amazon Redshift
Impala
Evaluation Operations and
optimization
Perform PoC and
Validation
Design
Migration
Plan
Migration,
Integration and
Validation
Amazon Redshift Migration Process
ENGINE X Amazon Redshift
ETL Scripts
SQL in reports
Adhoc. queries
How to Migrate?
Schema Conversion Database Migration
Map data types
Choose compression
encoding, sort keys,
distribution keys
Generate and apply DDL
Schema & Data
Transformation
Data Migration
Convert SQL Code
Bulk Load
Capture updates
Transformations
Assess Gaps
Stored Procedures
Functions
1 2
3
4
If you forget everything else…
• Lift-and-Shift is NOT an ideal approach
• Temptation to use the existing investment in ETL/Process
• Unique Characteristics to Redshift
• AWS has a rich ecosystem of solutions
• Your final solution will use other AWS services
• AWS Solution Architects, ProServ, and Partners can help
Amazon Redshift History & Development
Columnar
MPP
OLAP
AWS IAMAmazon
VPC
Amazon SWF
Amazon S3 AWS KMS Amazon
Route 53
Amazon
CloudWatch
Amazon
EC2
PostgreSQL
Amazon Redshift
February 2013
February 2017
> 100 Significant Patches
> 140 Significant Features
Amazon Redshift Cluster Architecture
Amazon Redshift Cluster Architecture
 Massively parallel, shared nothing
 Leader node
– SQL endpoint
– Stores metadata
– Coordinates parallel SQL processing
 Compute nodes
– Local, columnar storage
– Executes queries in parallel
– Load, backup, restore
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
S3 / EMR / DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
Designed for I/O Reduction
 Columnar storage
 Data compression
 Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Accessing dt with row storage:
– Need to read everything
– Unnecessary I/O
aid loc dt
CREATE TABLE loft_migration (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
Designed for I/O Reduction
 Columnar storage
 Data compression
 Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Accessing dt with columnar storage:
– Only scan blocks for relevant
column
aid loc dt
CREATE TABLE loft_migration (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
Designed for I/O Reduction
 Columnar storage
 Data compression
 Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Columns grow and shrink independently
• Effective compression ratios due to like data
• Reduces storage requirements
• Reduces I/O
aid loc dt
CREATE TABLE loft_migration (
aid INT ENCODE LZO
,loc CHAR(3) ENCODE BYTEDICT
,dt DATE ENCODE RUNLENGTH
);
Designed for I/O Reduction
 Columnar storage
 Data compression
 Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
aid loc dt
CREATE TABLE loft_migration (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
• In-memory block metadata
• Contains per-block MIN and MAX value
• Effectively prunes blocks which cannot
contain data for a given query
• Eliminates unnecessary I/O
Migration Best Practices
Terminology and Concepts: Slices
 A slice can be thought of like a “virtual compute node”
– Unit of data partitioning
– Parallel query processing
 Facts about slices:
– Each compute node has either 2, 16, or 32 slices
– Table rows are distributed to slices
– A slice processes only its own data
Terminology and Concepts: Data Distribution
• Distribution style is a table property which dictates how that table’s data is
distributed throughout the cluster:
• KEY: Value is hashed, same value goes to same location (slice)
• ALL: Full table data goes to first slice of every node
• EVEN: Round robin
• Goals:
• Distribute data evenly for parallel processing
• Minimize data movement during query processing
KEY
ALL
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
EVEN
DS2.8XL Compute Node
 Ingestion Throughput:
– Each slice’s query processors can load one file at a time:
 Streaming decompression
 Parse
 Distribute
 Write
 Realizing only partial node usage as 6.25% of slices are active
0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
Data Loading Best Practices
 Use at least as many input
files as there are slices in the
cluster
 With 16 input files, all slices
are working so you maximize
throughput
 COPY continues to scale
linearly as you add nodes
16 Input Files
DS2.8XL Compute Node
0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
Data Loading Best Practices Continued
Data Preparation
 Export Data from Source System
– CSV Recommend (Delimiter '|')
 Be aware of UTF-8 varchar columns (UTF-8 take 4 bytes per char)
 Be aware of your NULL character (N)
– GZIP Compress Files
– Split Files (1MB – 1GB after gzip compression)
 Useful COPY Options for PoC Data
– MAXERRORS
– ACCEPTINVCHARS
– NULL AS
Keep Columns as Narrow as Possible
• Buffers allocated based on declared
column width
• Wider than needed columns mean
memory is wasted
• Fewer rows fit into memory; increased
likelihood of queries spilling to disk
• Check
SVV_TABLE_INFO(max_varchar)
• SELECT max(len(col)) FROM table
Amazon Redshift is a Data Warehouse
 Optimized for batch inserts
– The time to insert a single row in Redshift is roughly the same as inserting
100,000 rows
 Updates are delete + insert of the row
• Deletes mark rows for deletion
 Blocks are immutable
– Minimum space used is one block per column, per slice
 Auto Compression
– Samples data automatically when COPY into an empty table
 Samples up to 100,000 rows and picks optimal encoding
– Turn off Auto Compression for Staging Tables
 Bake encodings into your DDL or use CREATE TABLE (LIKE …)
 Analyze Compression
– Data profile has changed
– Run after changing sort key
Column Compression
Raw encoding (RAW)
Byte-dictionary (BYTEDICT)
Delta encoding (DELTA / DELTA32K)
Mostly encoding (MOSTLY8 / MOSTLY16 / MOSTLY32)
Runlength encoding (RUNLENGTH)
Text encoding (TEXT255 / TEXT32K)
LZO encoding (LZO)
Zstandard (ZSTD)
Average: 2-4x
Column (Compression) Encoding Types
Primary/Unique/Foreign Key Constraints
 Primary/Unique/Foreign Key constraints are NOT enforced
– If you load data multiple times, Amazon Redshift won’t complain
– If you declare primary keys in your DDL, the optimizer will expect the data to
be unique
 Redshift optimizer uses declared constraints to pick optimal plan
– In certain cases it can result in performance improvements
Benchmarking Tips
• Verify tables are vacuumed and analyzed
• Check SVV_TABLE_INFO (STATS_OFF, UNSORTED)
• Vacuum & Analyze
• https://github.com/awslabs/amazon-redshift-utils/tree/master/src/AnalyzeVacuumUtility
• Verify column encodings
• Check PG_TABLE_DEF
• Re-encode tables
• https://github.com/awslabs/amazon-redshift-utils/tree/master/src/ColumnEncodingUtility
• Verify good distribution keys
• SVV_TABLE_INFO (SKEW_ROWS)
Migration Tools
Convert schema in a few clicks
Sources include Oracle, Teradata,
Greenplum and Netezza
Automatic schema optimization
Converts application SQL code
Detailed assessment report
AWS Schema
Conversion Tool
(AWS SCT)
AWS Schema Conversion Tool
Start your first migration in few minutes
Sources include: Aurora, Oracle, SQL
Server, MySQL and PostgreSQL
Bulk load and continuous replication
Migrate a TB for $3
Fault tolerant
(AWS DMS)
AWS DMS: Change Data Capture
Replication instance
Source Target
Update
t1 t2
t1
t2
Transactions Change
apply
after bulk
load
For Large Data Transfers
 AWS Import/Export Snowball
 AWS Snowball Edge
 AWS Snowmobile
Data Integration Partners
Data Integration Systems Integrators
Amazon Redshift
Beyond Amazon Redshift…
Resources
 https://github.com/awslabs/amazon-redshift-utils
 https://github.com/awslabs/amazon-redshift-monitoring
 https://github.com/awslabs/amazon-redshift-udfs
 Admin scripts
Collection of utilities for running diagnostics on your cluster
 Admin views
Collection of utilities for managing your cluster, generating schema DDL, etc.
 ColumnEncodingUtility
Gives you the ability to apply optimal column encoding to an established schema
with data already loaded
 Amazon Redshift Engineering’s Advanced Table Design Playbook
https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design-
playbook-preamble-prerequisites-and-prioritization/
Thank you!
gangulav@amazon.com
aws.amazon.com/activate

Mais conteúdo relacionado

Mais procurados

PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
MinhLeNguyenAnh2
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 
AWS-Architecture-Icons-Deck_For-Dark-BG_04282023.pptx
AWS-Architecture-Icons-Deck_For-Dark-BG_04282023.pptxAWS-Architecture-Icons-Deck_For-Dark-BG_04282023.pptx
AWS-Architecture-Icons-Deck_For-Dark-BG_04282023.pptx
NabilMECHERI
 

Mais procurados (20)

Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Data Governance for Enterprises
Data Governance for EnterprisesData Governance for Enterprises
Data Governance for Enterprises
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
AWS-Architecture-Icons-Deck_For-Dark-BG_04282023.pptx
AWS-Architecture-Icons-Deck_For-Dark-BG_04282023.pptxAWS-Architecture-Icons-Deck_For-Dark-BG_04282023.pptx
AWS-Architecture-Icons-Deck_For-Dark-BG_04282023.pptx
 
The architecture of search engines in Booking.com
The architecture of search engines in Booking.comThe architecture of search engines in Booking.com
The architecture of search engines in Booking.com
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft Azure
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 

Semelhante a Best Practices for Migrating your Data Warehouse to Amazon Redshift

Semelhante a Best Practices for Migrating your Data Warehouse to Amazon Redshift (20)

Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 

Mais de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
amilabibi1
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 

Último (18)

Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 

Best Practices for Migrating your Data Warehouse to Amazon Redshift

  • 1. Best Practices for Migrating your Data Warehouse to Amazon Redshift Vikram Gangulavoipalyam, Enterprise Solutions Architect, June 13th 2017
  • 2. Migrating your Data Warehouse Overview • Why Migrate • Customer Success Stories • Amazon Redshift History and Development • Cluster Architecture • Migration Best Practices • Migration Tools • Open Q&A
  • 3. Are you a user ?
  • 5.  100x faster  Scales from GBs to PBs  Analyze data without storage constraints  10x cheaper  Easy to provision and operate  Higher productivity  10x faster  No programming  Standard interfaces and integration to leverage BI tools, machine learning, streaming Transactional database MPP database Hadoop Why Migrate to Amazon Redshift?
  • 7. Migration from Oracle @ Boingo Wireless 2000+ Commercial Wi-Fi locations 1 million+ Hotspots 90M+ ad engagements 100+ countries Legacy DW: Oracle 11g based DW Before migration Rapid data growth slowed analytics Mediocre IOPS, limited memory, vertical scaling Admin overhead Expensive (license, h/w, support) After migration 180x performance improvement 7x cost savings
  • 8. 0 50,000 100,000 150,000 200,000 250,000 300,000 350,000 400,000 Exadata SAP HANA Redshift $400,000 $300,000 $55,000 7,200 2,700 15 15 Query Performance Data Load Performance 1 year of data 1 million records Latencyinseconds RedshiftExisting System 7X cheaper than Oracle Exadata 180X faster than Oracle database Migration from Oracle @ Boingo Wireless
  • 9. Migration from Greenplum @ NTT Docomo 68 million customers 10s of TBs per day of data across mobile network 6PB of total data (uncompressed) Data science for marketing operations, logistics etc. Legacy DW: Greenplum on-premises After migration: 125 node DS2.8XL cluster 4,500 vCPUs, 30TB RAM 6 PB uncompressed 10x faster analytic queries 50% reduction in time for new BI app. deployment Significantly less ops. overhead
  • 10. Migration from SQL on Hadoop @ Yahoo  Analytics for website/mobile events across multiple Yahoo properties  On an average day 2B events 25M devices Before migration: Hive – Found it to be slow, hard to use, share and repeat After migration: 21 node DC1.8XL (SSD) 50TB compressed data 100x performance improvement Real-time insights Easier deployment and maintenance
  • 11. Migration from SQL on Hadoop @ Yahoo 1 10 100 1000 10000 Count Distinct Devices Count All Events Filter Clauses Joins Seconds Amazon Redshift Impala
  • 12. Evaluation Operations and optimization Perform PoC and Validation Design Migration Plan Migration, Integration and Validation Amazon Redshift Migration Process
  • 13. ENGINE X Amazon Redshift ETL Scripts SQL in reports Adhoc. queries How to Migrate? Schema Conversion Database Migration Map data types Choose compression encoding, sort keys, distribution keys Generate and apply DDL Schema & Data Transformation Data Migration Convert SQL Code Bulk Load Capture updates Transformations Assess Gaps Stored Procedures Functions 1 2 3 4
  • 14. If you forget everything else… • Lift-and-Shift is NOT an ideal approach • Temptation to use the existing investment in ETL/Process • Unique Characteristics to Redshift • AWS has a rich ecosystem of solutions • Your final solution will use other AWS services • AWS Solution Architects, ProServ, and Partners can help
  • 15. Amazon Redshift History & Development
  • 16. Columnar MPP OLAP AWS IAMAmazon VPC Amazon SWF Amazon S3 AWS KMS Amazon Route 53 Amazon CloudWatch Amazon EC2 PostgreSQL Amazon Redshift
  • 17. February 2013 February 2017 > 100 Significant Patches > 140 Significant Features
  • 18. Amazon Redshift Cluster Architecture
  • 19. Amazon Redshift Cluster Architecture  Massively parallel, shared nothing  Leader node – SQL endpoint – Stores metadata – Coordinates parallel SQL processing  Compute nodes – Local, columnar storage – Executes queries in parallel – Load, backup, restore 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores S3 / EMR / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node
  • 20. Designed for I/O Reduction  Columnar storage  Data compression  Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Accessing dt with row storage: – Need to read everything – Unnecessary I/O aid loc dt CREATE TABLE loft_migration ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date );
  • 21. Designed for I/O Reduction  Columnar storage  Data compression  Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Accessing dt with columnar storage: – Only scan blocks for relevant column aid loc dt CREATE TABLE loft_migration ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date );
  • 22. Designed for I/O Reduction  Columnar storage  Data compression  Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Columns grow and shrink independently • Effective compression ratios due to like data • Reduces storage requirements • Reduces I/O aid loc dt CREATE TABLE loft_migration ( aid INT ENCODE LZO ,loc CHAR(3) ENCODE BYTEDICT ,dt DATE ENCODE RUNLENGTH );
  • 23. Designed for I/O Reduction  Columnar storage  Data compression  Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 aid loc dt CREATE TABLE loft_migration ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); • In-memory block metadata • Contains per-block MIN and MAX value • Effectively prunes blocks which cannot contain data for a given query • Eliminates unnecessary I/O
  • 25. Terminology and Concepts: Slices  A slice can be thought of like a “virtual compute node” – Unit of data partitioning – Parallel query processing  Facts about slices: – Each compute node has either 2, 16, or 32 slices – Table rows are distributed to slices – A slice processes only its own data
  • 26. Terminology and Concepts: Data Distribution • Distribution style is a table property which dictates how that table’s data is distributed throughout the cluster: • KEY: Value is hashed, same value goes to same location (slice) • ALL: Full table data goes to first slice of every node • EVEN: Round robin • Goals: • Distribute data evenly for parallel processing • Minimize data movement during query processing KEY ALL Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 EVEN
  • 27. DS2.8XL Compute Node  Ingestion Throughput: – Each slice’s query processors can load one file at a time:  Streaming decompression  Parse  Distribute  Write  Realizing only partial node usage as 6.25% of slices are active 0 2 4 6 8 10 12 141 3 5 7 9 11 13 15 Data Loading Best Practices
  • 28.  Use at least as many input files as there are slices in the cluster  With 16 input files, all slices are working so you maximize throughput  COPY continues to scale linearly as you add nodes 16 Input Files DS2.8XL Compute Node 0 2 4 6 8 10 12 141 3 5 7 9 11 13 15 Data Loading Best Practices Continued
  • 29. Data Preparation  Export Data from Source System – CSV Recommend (Delimiter '|')  Be aware of UTF-8 varchar columns (UTF-8 take 4 bytes per char)  Be aware of your NULL character (N) – GZIP Compress Files – Split Files (1MB – 1GB after gzip compression)  Useful COPY Options for PoC Data – MAXERRORS – ACCEPTINVCHARS – NULL AS
  • 30. Keep Columns as Narrow as Possible • Buffers allocated based on declared column width • Wider than needed columns mean memory is wasted • Fewer rows fit into memory; increased likelihood of queries spilling to disk • Check SVV_TABLE_INFO(max_varchar) • SELECT max(len(col)) FROM table
  • 31. Amazon Redshift is a Data Warehouse  Optimized for batch inserts – The time to insert a single row in Redshift is roughly the same as inserting 100,000 rows  Updates are delete + insert of the row • Deletes mark rows for deletion  Blocks are immutable – Minimum space used is one block per column, per slice
  • 32.  Auto Compression – Samples data automatically when COPY into an empty table  Samples up to 100,000 rows and picks optimal encoding – Turn off Auto Compression for Staging Tables  Bake encodings into your DDL or use CREATE TABLE (LIKE …)  Analyze Compression – Data profile has changed – Run after changing sort key Column Compression
  • 33. Raw encoding (RAW) Byte-dictionary (BYTEDICT) Delta encoding (DELTA / DELTA32K) Mostly encoding (MOSTLY8 / MOSTLY16 / MOSTLY32) Runlength encoding (RUNLENGTH) Text encoding (TEXT255 / TEXT32K) LZO encoding (LZO) Zstandard (ZSTD) Average: 2-4x Column (Compression) Encoding Types
  • 34. Primary/Unique/Foreign Key Constraints  Primary/Unique/Foreign Key constraints are NOT enforced – If you load data multiple times, Amazon Redshift won’t complain – If you declare primary keys in your DDL, the optimizer will expect the data to be unique  Redshift optimizer uses declared constraints to pick optimal plan – In certain cases it can result in performance improvements
  • 35. Benchmarking Tips • Verify tables are vacuumed and analyzed • Check SVV_TABLE_INFO (STATS_OFF, UNSORTED) • Vacuum & Analyze • https://github.com/awslabs/amazon-redshift-utils/tree/master/src/AnalyzeVacuumUtility • Verify column encodings • Check PG_TABLE_DEF • Re-encode tables • https://github.com/awslabs/amazon-redshift-utils/tree/master/src/ColumnEncodingUtility • Verify good distribution keys • SVV_TABLE_INFO (SKEW_ROWS)
  • 37. Convert schema in a few clicks Sources include Oracle, Teradata, Greenplum and Netezza Automatic schema optimization Converts application SQL code Detailed assessment report AWS Schema Conversion Tool (AWS SCT)
  • 39. Start your first migration in few minutes Sources include: Aurora, Oracle, SQL Server, MySQL and PostgreSQL Bulk load and continuous replication Migrate a TB for $3 Fault tolerant (AWS DMS)
  • 40. AWS DMS: Change Data Capture Replication instance Source Target Update t1 t2 t1 t2 Transactions Change apply after bulk load
  • 41. For Large Data Transfers  AWS Import/Export Snowball  AWS Snowball Edge  AWS Snowmobile
  • 42. Data Integration Partners Data Integration Systems Integrators Amazon Redshift
  • 44. Resources  https://github.com/awslabs/amazon-redshift-utils  https://github.com/awslabs/amazon-redshift-monitoring  https://github.com/awslabs/amazon-redshift-udfs  Admin scripts Collection of utilities for running diagnostics on your cluster  Admin views Collection of utilities for managing your cluster, generating schema DDL, etc.  ColumnEncodingUtility Gives you the ability to apply optimal column encoding to an established schema with data already loaded  Amazon Redshift Engineering’s Advanced Table Design Playbook https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design- playbook-preamble-prerequisites-and-prioritization/

Notas do Editor

  1. We will start with the reasons and rationale to migrate to Redshift and we will do it with some customer success stories. We will in fact get into specific details of a couple of use cases to drive home the point about the value of migrating to Redshift. We will then briefly touch upon the genesis of Redshift as a technology and it’s architecture. We will not be doing deep dive in to architecture itself. Instead we will focus on the components that play an important role in the actual migration of data into the Redshift environment. The meat of this presentation will be focused though on Migration best practices and Migration tools. At the end, we will open up the floor for any questions.
  2. What are some competencies of Redshift as a DW that should deem an actual migration? How do we actually quantify the value of under taking such an initiative?
  3. Our view of data warehousing and why we built Amazon Redshift. We want to make data warehousing fast, inexpensive and easy to use for companies of all sizes For Enterprises, benefits are: 1/10th cost, easy to provision and manage, productive DBAs For Big data companies: 10x faster, use standard SQL-no need to write custom code, use existing BI tools For SaaS: analysis in-line with process flows, pay as you go, grow as you need, managed availability & DR – back up to s3 continuously and asynchronous Snapshot to s3 in a different region)
  4. Nasdaq security – Ingests 5B rows/trading day, analyzes orders, trades and quotes Yelp – 10s of millions of ads/day, 250M mobile events/day, analyzes ad campaign performance and new feature usage Desk.com – Ingests 200K case related events/hour and runs a user facing portal on Redshift Lets look at a couple of use cases in detail.
  5. As an example, Boingo takes advantage of multiple Amazon Kinesis Streams to ingest streaming data into Amazon Redshift. Boingo also uses Amazon Kinesis Streams to process data from thousands of wireless LAN controllers and place it into Amazon Redshift.
  6. Japan based major telecom retailer They experienced performance challenges as the amount of data grew to multiple petabytes. To address the challenges, the company sought to move some of its data analysis workload to a cloud-based solution. Elasticity to support sudden spikes in traffic significantly reduced the operational overhead that they had had previously.
  7. How many of you know HIVE? Mobile events were being stored on-prem in a hadoop cluster Talk about additional admin cost of commissioning and de-commissioning node in the cluster
  8. Majority of the KPIs that the customer was interested in included aggregate data across multiple joins.
  9. Evaluation – Find out the characteristics of the technology, understand how the technology value maps to the business value required through KPIs and metrics Then, perform a PoC and validate the KPIs defined. Design a migration plan using the best practices we will be discussing in the next few slides, validate the plan, execute it and finally operationalize it.
  10. Secondly, the tool can automatically re-write code which is stored inside the database; views, stored procedures, triggers and functions are automatically evaluated and where possible, we’ll go ahead and convert them to their equivalent in the new database; where we can’t do this accurately, we’ll highlight it and make smart recommendations (providing links to the docs) to guide you through the process manually. So when coupled with the Database Migration Service, the Schema Conversion tool will help you go from whatever database you’re on now, to a more open database in the Cloud with remarkably low effort in terms of skilled DBA time and cost, giving you an ‘out’ for your current database relationship woes
  11. There are few specific knobs that need to be tuned that are specific to Redshift design. Compression/Encoding Data Distribution Styles – the details of which we will get into in the next few slides Sort Keys – Interleaved, compound Vacuum and Analyze Constraints are defined but never imposed
  12. Amazon Redshift is based on PostgreSQL 8.0.2 Tightly integrated with other AWS services including S3 and IAM Initial preview beta was released on November 2012 and a full release was made available on February 15, 2013.
  13. Continuous Development
  14. You can use the COPY command to load data in parallel from an Amazon EMR cluster configured to write text files to the cluster's Hadoop Distributed File System (HDFS) in the form of fixed-width files, character-delimited files, CSV files, or JSON-formatted files. Talk about COPY from s3, dynamo and ability to directly connect to master node using SQL client.
  15. CPU Trade-off
  16. You can use ZM to skip ranges that do not matter to your queries.
  17. Node Size vCPU ECU RAM (GiB) Slices Per Node Storage Per Node Node Range Total Capacity ds2.xlarge 4 13 31 2 2 TB HDD 1–32 64 TB ds2.8xlarge 36 119 244 16 1 6 TB HDD 2–128 2 PB
  18. Enables loading of data into VARCHAR columns even if the data contains invalid UTF-8 characters. When ACCEPTINVCHARS is specified, COPY replaces each invalid UTF-8 character with a string of equal length consisting of the character specified by replacement_char. For example, if the replacement character is '^', an invalid three-byte character will be replaced with '^^^'.
  19. If no compression is specified in a CREATE TABLE or ALTER TABLE statement, Amazon Redshift automatically assigns compression encoding as follows: Columns that are defined as sort keys are assigned RAW compression. Columns that are defined as BOOLEAN, REAL, or DOUBLE PRECISION data types are assigned RAW compression. All other columns are assigned LZO compression.
  20. - Embedded SQL conversion Cloud native code optimization PL/SQL or T/SQl code converted to MySql/Aurora or Postgres Stored procs Greenplum Database (version 4.3 and later) Microsoft SQL Server (version 2008 and later) Netezza (version 7.2 and later) Oracle (version 11 and later) Teradata (version 14 and later) Vertica (version 7.2.2 and later)
  21. You can set up a migration task within minutes in the AWS Management Console. This includes setting up connections to the source and target databases, as well as choosing the replication instance used to run the migration process. Srcs  Oracle, SQL, MySQL, PostGres, Mongo, SAP ASE (Sybase) & Aurora Targets  Redshift + Dynamo + S3 6 TB cache limit (Repln Server) IAM for Security – Fine grained AC using resource name and tags Use KMS key to encrypt data in the replication instance You can encrypt connections using SSL (You can use DMS to assign a certificate to the end point)
  22. Oracle supplemental logging MySQL row-level bin logging SQL Server bulk logged/full recovery Postgres WAL No agent Uses recovery log Native change data capture API