SlideShare uma empresa Scribd logo
1 de 65
Baixar para ler offline
Deploying your Data
Warehouse on AWS
Ian Meyers, AWS
Davinder Mundy, Informatica
Nick Holmes, KCOM
Freedom From…
You’ve Got
Mail!
AUDIT
Very Expensive Proprietary Lock-In Punitive
Licensing
Petabyte scale; massively parallel
Relational data warehouse
Fully managed; zero admin
SSD & HDD platforms
As low as $1,000/TB/Year
Amazon
Redshift
Amazon Redshift Spectrum
Run SQL queries directly against data in S3 using thousands of nodes
Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query
High concurrency: Multiple
clusters access same data
No ETL: Query data in-place
using open file formats
Full Amazon Redshift
SQL support
S3
SQL
Common Redshift
Use Cases
Use Case: Traditional Data Warehousing
Business
Reporting
Advanced pipelines
and queries
Secure and
Compliant
Easy Migration – Point & Click using AWS Database Migration Service
Secure & Compliant – End-to-End Encryption. SOC 1/2/3, PCI-DSS, HIPAA and FedRAMP compliant
Large Ecosystem – Variety of cloud and on-premises BI and ETL tools
Japanese Mobile
Phone Provider
Powering 100 marketplaces
in 50 countries
World’s Largest Children’s
Book Publisher
Bulk Loads
and Updates
Use Case: Log Analysis
Log & Machine
IOT Data
Clickstream
Events Data
Time-Series
Data
Cheap – Analyze large volumes of data cost-effectively
Fast – Massively Parallel Processing (MPP) and columnar architecture for fast queries and parallel loads
Near real-time – Micro-batch loading and Amazon Kinesis Firehose for near-real time analytics
Interactive data analysis and
recommendation engine
Ride analytics for pricing
and product development
Ad prediction and
on-demand analytics
Use Case: Business Applications
Multi-Tenant BI
Applications
Back-end
services
Analytics as a
Service
Fully Managed – Provisioning, backups, upgrades, security, compression all come built-in so you can
focus on your business applications
Ease of Chargeback – Pay as you go, add clusters as needed. A few big common clusters, several
data marts
Service Oriented Architecture – Integrated with other AWS services. Easy to plug into your pipeline
Infosys Information
Platform (IIP)
Analytics-as-a-
Service
Product and Consumer
Analytics
AWS Named as a Leader in The Forrester
WaveTM: Big Data Warehouse Q2 2017
http://bit.ly/2w1TAEy
On June 15, Forrester published the Big Data
Warehouse, Q2 2017, in which AWS is
positioned as a Leader. According to Forrester,
“With more than 5,000 deployments, Amazon
Redshift has the largest data warehouse
deployments in the cloud.” AWS received the
highest score possible, 5/5, for customer base,
market awareness, ability to execute, road map,
support, and partners. “AWS’s key strengths lie
in its dynamic scale, automated administration,
flexibility of database offerings, good security,
and high availability (HA) capabilities, which
make it a preferred choice for customers.
Redshift Customers
Selected Amazon Redshift customers
NTT Docomo: Japan’s largest mobile service provider
68 million customers
Tens of TBs per day of data across a
mobile network
6 PB of total data (uncompressed)
Data science for marketing
operations, logistics, and so on
Greenplum on-premises
Scaling challenges
Performance issues
Need same level of security
Need for a hybrid environment
125 node DS2.8XL cluster
4,500 vCPUs, 30 TB RAM
2 PB compressed
10x faster analytic queries
50% reduction in time for new
BI application deployment
Significantly less operations
overhead
Data
Source
ET
AWS
Direct
Connect
Client
Forwarder
LoaderState
Management
SandboxAmazon Redshift
S3
NTT Docomo: Japan’s largest mobile service provider
Nasdaq: powering 100 marketplaces in 50 countries
Orders, quotes, trade executions,
market “tick” data from 7 exchanges
7 billion rows/day
Analyze market share, client activity,
surveillance, billing, and so on
Microsoft SQL Server on-premises
Expensive legacy DW
($1.16 M/yr.)
Limited capacity (1 yr. of data
online)
Needed lower TCO
Must satisfy multiple security
and regulatory requirements
Similar performance
23 node DS2.8XL cluster
828 vCPUs, 5 TB RAM
368 TB compressed
2.7 T rows, 900 B derived
8 tables with 100 B rows
7 month migration
¼ the cost, 2x storage, room to
grow
Faster performance, very
secure
Nasdaq: powering 100 marketplaces in 50 countries
Redshift Architecture
Amazon Redshift Cluster Architecture
Massively parallel, shared nothing
architecture
Streaming Backup/Restore from S3
Leader node
• SQL endpoint
• Stores metadata
• Coordinates parallel SQL processing
Compute nodes
• Local, columnar storage
• Executes queries in parallel
• Load, backup, restore
• 2, 16 or 32 slices
Redshift Cluster
JDBC/ODBC
Leader Node
Compute Nodes
Efficient Data Loads
Streaming Backup/Restore
Amazon Redshift Cluster Architecture
Leader node
• SQL endpoint
• Stores metadata
• Coordinates parallel SQL processing
Compute nodes
• Local, columnar storage
• Executes queries in parallel
• Load, backup, restore
• 2, 16 or 32 slices
Redshift Spectrum
• Query data at rest on S3
• Ultra high scale, increased concurrency
• JSON, CSV, Parquet storage
Redshift Cluster
JDBC/ODBC
...
1 2 3 4 N
Leader Node
Compute Nodes
Spectrum Fleet
Amazon S3
When should you add Spectrum?
Your data will get bigger
• On average, data warehousing volumes grow 10x every 5 years
• The average Amazon Redshift customer doubles data each year
Amazon Redshift Spectrum makes data analysis simpler
• Access your data without ETL pipelines
• Teams using Amazon EMR, Athena & Redshift can collaborate using the same data lake
• Late binding views enable federated queries between internal & external tables
Amazon Redshift Spectrum improves availability and concurrency
• Run multiple Amazon Redshift clusters against common data
• Isolate jobs with tight SLAs from ad hoc analysis
Deploying your Data
Warehouse on AWS
Ingest
Prepare for
Analytics PresentStore AnalyzeStore
Deploying your Data
Warehouse on AWS
Batch
Firehose
Glue
S3
Streaming?
DBs (OLTP)?
Own code?
Parallel?
Managed?
SCT Migration
Agent
DWH?
Prepare for
Analytics PresentStore AnalyzeStore
Deploying your Data
Warehouse on AWS
Batch
Firehose
Glue
S3
Streaming?
DBs (OLTP)?
Own code?
Parallel?
Managed?
SCT Migration
Agent
DWH?
Present
Prepare for
Analytics AnalyzeStore
Deploying your Data
Warehouse on AWS
AWS Lambda Glue
and/or
and/or
ETL? Managed?
Complex? Cost?
Batch
Firehose
Glue
S3
Streaming?
DBs (OLTP)?
Own code?
Parallel?
Managed?
SCT Migration
Agent
DWH?
PresentAnalyzeStore
Deploying your Data
Warehouse on AWS
AWS Lambda Glue
and/or
and/or
ETL? Managed?
Complex? Cost?
Batch
Firehose
Glue
S3
Streaming?
DBs (OLTP)?
Own code?
Parallel?
Managed?
SCT Migration
Agent
DWH?
S3
Query optimized
&
Ready for self-
service
PresentAnalyze
Deploying your Data
Warehouse on AWS
AWS Lambda Glue
and/or
and/or
ETL? Managed?
Complex? Cost?
Batch
Firehose
Glue
S3
Streaming?
DBs (OLTP)?
Own code?
Parallel?
Managed?
SCT Migration
Agent
DWH?
S3 Athena
Query Service
Ad-hoc Analysis
Redshift Spectrum Redshift Spectrum
DWH and Data Marts
Redshift
Data Warehouse
Redshift
Data Warehouse
Present
Predictive
Query optimized
&
Ready for self-
service
Deploying your Data
Warehouse on AWS
AWS Lambda Glue
and/or
and/or
ETL? Managed?
Complex? Cost?
Batch
Firehose
Glue
S3
Streaming?
DBs (OLTP)?
Own code?
Parallel?
Managed?
SCT Migration
Agent
DWH?
S3 Athena
Query Service
Ad-hoc Analysis BI & Visualization
Redshift Spectrum Redshift Spectrum
DWH and Data Marts
Redshift
Data Warehouse
Redshift
Data Warehouse
Predictive
Query optimized
&
Ready for self-
service
BI and
Visualization
Business user
Sign-in
First analysis in about 60 seconds
aws.amazon.com/quicksight
Easy exploration of AWS data
Securely discover and connect to AWS data
Quickly explore AWS data sources
Relational databases (Amazon RDS, Amazon RDS for
Aurora,
Amazon Redshift)
NoSQL databases (Amazon DynamoDB)
Amazon EMR, Amazon S3, files (CSV, Excel, TSV,
XLF, CLF)
Streaming data sources (Amazon DynamoDB, Amazon
Kinesis)
Easily import data from any table or file
Automatic detection of data types
Business User
QuickSight API
Data Prep Metadata SuggestionsConnectors SPICE
Business User
QuickSight UI
Mobile Devices Web Browsers
Partner BI products
Amazon
S3
Amazon
Kinesis
Amazon
DynamoDB
Amazon
EMR
Amazon
Redshift
Amazon RDSFiles Third-party
Partner BI solutions
Data Ingest & Migrations
ETL on Redshift
Online Software Store
aws.amazon.com/marketplace
Start	your	first	migration	in	10	minutes	or	less
Keep	your	apps	running	during	the	migration
Replicate	within,	to	or	from	Amazon	EC2	or	RDS
Move	data	to	the	same	or	different	database	engine	
Sign	up	for	preview	at	aws.amazon.com/dms
AWS
Database Migration
Service
Getting data to Redshift using AWS Database
Migration Service (DMS)
Simple to use Minimal Downtime Supports most widely
used Databases
Low Cost Fast & Easy to Set-up Reliable
Migrate	off	Oracle	and	SQL	Server
Move	your	tables,	views,	stored	procedures	and	
DML	to	MySQL,	MariaDB,	and	Amazon	Aurora
Know	exactly	where	manual	edits	are	neededAWS
Schema Conversion
Tool
Optimizing Amazon Redshift with the AWS
Schema Conversion Tool
amzn.to/2sTYow1
Extending your DWH (or Migrations) to
Redshift
http://amzn.to/2vN3UBO
Oracle to Redshift
Extending your DWH (or Migrations) to
Redshift
http://amzn.to/2wZy7OA
Teradata to Redshift
Extending your DWH (or Migrations) to
Redshift
http://amzn.to/2hbKwYd
Converge Silos to Redshift
Redshift Playbook
Part 1: Preamble, Prerequisites, and
Prioritization
Part 2: Distribution Styles and
Distribution Keys
Part 3: Compound and Interleaved
Sort Keys
Part 4: Compression Encodings
Part 5: Table Data Durability
amzn.to/2quChdM
`
Informatica for Amazon
Redshift
Davinder Mundy - Senior Pre-Sales Consultant
Paths to Cloud Data Warehousing and Analytics
Extend
• Quickly meet business
demands
• More variety of data
formats for analysis
Migrate (‘Lift & Shift’)
• Current warehouse not
performing & need to
scale
• Reduce costs (platform &
maintenance)
Born in the cloud
• Agile Self-Service
Analytics
• Highly Scalable
• Elastic
Informatica supports both ETL and ELT Patterns
ETL (1, 2, 3)
1. Bulk Source Data Ingestion
2. Multi-part load into S3 of
compressed files
3. Copy S3 data into Amazon
Redshift Staging
ELT (4, 5, 6)
1. SQL Pushdown for Amazon
Redshift to Amazon Redshift
Table Integrations within same
cluster
Redshift
StagingAWS S3
Informatica Cloud/ PowerCenter
1
2
3
4 5
Redshift
Intermediate
Redshift
Analytics
6
4 5
Same Redshift Cluster
Optimized Data Ingestion into Amazon Redshift
1. Source Bulk Data Loader
2. Partitions - parallel data pipelines
3. Local staging files
4. S3 Parallel Upload
5. Copy Command to Redshift
Customer Case Studies
Fox Entertainment– Migrate to Amazon Redshift
Goals: Universal Data warehouse across
business units in different global regions; Scale
and provide self service analytics at lower cost;
Accelerate Journey to AWS Cloud
Needs: Migrate from On-premise MPP Data;
Benefits:
• Repoint 6000 PC ETL mappings from
Netezza to Redshift; Able to reuse existing
Informatica workflows and migrate
quickly to Redshift
• Informatica SQL Pushdown (ELT) was
able to transform and push millions of
records every hour 24 x 7.
Logs,
Click Streams
CSV,
Social Feeds
S3
Staging Tables Intermediate Tables Analysis Tables
Oracle
SaaS
Migration to AWS Cloud; Reuse PowerCenter mappings
Shaw Communications – Legacy Data Warehouse
CLOUD
ON-PREM
Shaw Communications – Hybrid Data Warehouse
CLOUD
ON-PREM
Shaw Communications – Hybrid Data Warehouse
CLOUD
ON-PREM
Cloud
Shaw Communications – Hybrid Data Warehouse
CLOUD
ON-PREM
Cloud
Hybrid pattern – Informatica Cloud, Amazon Redshift, Kinesis Streams
Adaptive Biotechnologies – Born in the Cloud
Goals: Acquisitions and growth
propelled the need to create a
DWH; Adhoc analytics for their
data scientists
Needs: Flexible and scalable
DWH/ETL; data models constantly
changing; easy to set up and
manage; cost effective
Benefits:
• Self service made easy with
Redshift and Informatica
• Informatica gracefully handled
HL7 and other B2B formats and
helped transport it via SFTP to
our collection partners
Born in the Cloud! Build a modern Data Warehouse (Redshift) and ETL
(Informatica Cloud)
Cloud
LIMS
Bioinformatics
Pipelines
Customer
Portal
File
s
Legacy
System
s
Amazon Redshift Connector Capabilities
Robust Comprehensive High Performance Secure Flexible
§ Error management,
Notifications, &
Alerts
§ Auto-handle
special characters
§ Dynamically create
targets
§ Pre and Post SQL
§ SQL Overrides
§ S3 data retention
policies
§ AWS Multi-Region
support
§ Partitioning
§ SQL Pushdown
§ Optimized Lookups
§ Multi-part Upload &
Download
§ Compression
before S3 Upload
§ AWS KMS Support
§ IAM Roles
§ Client & Server Side
Encryption
§ S3 VPC Endpoint
§ Secure Agent on
premise
§ Informatica Hosted
agent
§ Agent on AWS
§ Configurable S3
Copy Options
§ Dynamic S3 Buckets
Informatica Products on AWS
Power Center
Informatica
Cloud
Big Data
Management
Enterprise
Informatica
Catalog
Informatica
Cloud
Intelligent Data
Lake
Informatica Data
Quality
Enterprise
Informatica
Catalog
Power Center
Master Data
Management
Big Data
Management
Informatica Data
Quality
Certified
Available
Learn more…..
Learn & Prepare
• Cloud Analytics with
Informatica Cloud &
Amazon Redshift
• PowerCenter on AWS
• Data Lakes on AWS
Get Started on AWS MarketplaceDeep-Dive
AWS and Informatica Relationship Team
Romain Roullet - AWS ISV Success Manager - EMEA
https://www.linkedin.com/in/romainroullet/
Nitin Mathur - AWS Strategy & Business Development Leader - Global
https://www.linkedin.com/in/nitmathur/
Andrew McIntyre - Informatica Strategy & Business Development Leader -
Global
https://www.linkedin.com/in/andrew-mcintyre-a6799765/
Ian Paton - Informatica UK Partnerships
https://www.linkedin.com/in/ian-paton-%E2%98%81-6256837/
AWS Data Warehouse Projects
Created by: Nick Holmes
October 2017
59
The KCOM Approach
Consulting background, with Architect, DBA & DevOps resources
MVP Design
MVP Implement
Test & optimise
Iterate throughout the project lifecycle
60
Data Management Project
• Volumes - 6 billion retail transactions and
60 million rows of customer viewing data
• Platform - AWS Redshift Massively
Parallel Processing (MPP) architecture
• ETL - ingress of 200GB compressed data
in 15 mins (2TB uncompressed)
• Performance - data matched between
two data sets in ~90 seconds (43 million
matched rows)
61
Travel IndustryTicketing
• IaC for all components to facilitate CI/CD (5 environments) & Immutable builds
• IAM based permissions for Redshift
• Bulk Load with DataPipeline
• ETL Management with Step Functions
• Aggregation transforms within RedShift
• Schemas are generated to support each report type
• Reports are generated on a daily basis & on demand
• Encryption for data at rest within the system (KMS)
62
Ticketing Architecture
2.2 Million rows per day
28 Day Moving Window analysis dataset
79 Different reports produced
63
Upgrades & Challenges
• Impact of Single AZ nature of Redshift
• Limited processing window
• Large volumes of data
• Dynamic rulesets
• Data privacy
• Redshift Spectrum
• AWS Glue
• Continuing schema optimisation
• “AWS Toybox”
Thank You
Nick Holmes nick.holmes@kcom.com
http://www.kcom.com/connected-thinking/?filter=AWS
@KCOMbusiness
Thank you!

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

AWS Architecting In The Cloud
AWS Architecting In The CloudAWS Architecting In The Cloud
AWS Architecting In The Cloud
 
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...
 
AWS로 데이터 마이그레이션을 위한 방안과 옵션 - 박성훈 스토리지 스페셜리스트 테크니컬 어카운트 매니저, AWS :: AWS Summit...
AWS로 데이터 마이그레이션을 위한 방안과 옵션 - 박성훈 스토리지 스페셜리스트 테크니컬 어카운트 매니저, AWS :: AWS Summit...AWS로 데이터 마이그레이션을 위한 방안과 옵션 - 박성훈 스토리지 스페셜리스트 테크니컬 어카운트 매니저, AWS :: AWS Summit...
AWS로 데이터 마이그레이션을 위한 방안과 옵션 - 박성훈 스토리지 스페셜리스트 테크니컬 어카운트 매니저, AWS :: AWS Summit...
 
Intro to AWS: Database Services
Intro to AWS: Database ServicesIntro to AWS: Database Services
Intro to AWS: Database Services
 
AWS S3 and GLACIER
AWS S3 and GLACIERAWS S3 and GLACIER
AWS S3 and GLACIER
 
Migrating Oracle Databases to AWS
Migrating Oracle Databases to AWSMigrating Oracle Databases to AWS
Migrating Oracle Databases to AWS
 
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
 
SRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS BatchSRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS Batch
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
Amazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration Service
 
Migrating On-Premises Databases to Cloud
Migrating On-Premises Databases to CloudMigrating On-Premises Databases to Cloud
Migrating On-Premises Databases to Cloud
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Introduction to Serverless
Introduction to ServerlessIntroduction to Serverless
Introduction to Serverless
 
데이터 마이그레이션 및 전송을 위한 AWS 스토리지 서비스 활용방안 - 박용선, 메가존 클라우드 매니저
데이터 마이그레이션 및 전송을 위한 AWS 스토리지 서비스 활용방안 - 박용선, 메가존 클라우드 매니저데이터 마이그레이션 및 전송을 위한 AWS 스토리지 서비스 활용방안 - 박용선, 메가존 클라우드 매니저
데이터 마이그레이션 및 전송을 위한 AWS 스토리지 서비스 활용방안 - 박용선, 메가존 클라우드 매니저
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
AWS 101
AWS 101AWS 101
AWS 101
 
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
 

Semelhante a Deploying your Data Warehouse on AWS

Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Amazon Web Services
 

Semelhante a Deploying your Data Warehouse on AWS (20)

Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
What's New & What's Next from AWS?
What's New & What's Next from AWS?What's New & What's Next from AWS?
What's New & What's Next from AWS?
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
 

Mais de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Deploying your Data Warehouse on AWS

  • 1. Deploying your Data Warehouse on AWS Ian Meyers, AWS Davinder Mundy, Informatica Nick Holmes, KCOM
  • 2. Freedom From… You’ve Got Mail! AUDIT Very Expensive Proprietary Lock-In Punitive Licensing
  • 3. Petabyte scale; massively parallel Relational data warehouse Fully managed; zero admin SSD & HDD platforms As low as $1,000/TB/Year Amazon Redshift
  • 4. Amazon Redshift Spectrum Run SQL queries directly against data in S3 using thousands of nodes Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query High concurrency: Multiple clusters access same data No ETL: Query data in-place using open file formats Full Amazon Redshift SQL support S3 SQL
  • 6. Use Case: Traditional Data Warehousing Business Reporting Advanced pipelines and queries Secure and Compliant Easy Migration – Point & Click using AWS Database Migration Service Secure & Compliant – End-to-End Encryption. SOC 1/2/3, PCI-DSS, HIPAA and FedRAMP compliant Large Ecosystem – Variety of cloud and on-premises BI and ETL tools Japanese Mobile Phone Provider Powering 100 marketplaces in 50 countries World’s Largest Children’s Book Publisher Bulk Loads and Updates
  • 7. Use Case: Log Analysis Log & Machine IOT Data Clickstream Events Data Time-Series Data Cheap – Analyze large volumes of data cost-effectively Fast – Massively Parallel Processing (MPP) and columnar architecture for fast queries and parallel loads Near real-time – Micro-batch loading and Amazon Kinesis Firehose for near-real time analytics Interactive data analysis and recommendation engine Ride analytics for pricing and product development Ad prediction and on-demand analytics
  • 8. Use Case: Business Applications Multi-Tenant BI Applications Back-end services Analytics as a Service Fully Managed – Provisioning, backups, upgrades, security, compression all come built-in so you can focus on your business applications Ease of Chargeback – Pay as you go, add clusters as needed. A few big common clusters, several data marts Service Oriented Architecture – Integrated with other AWS services. Easy to plug into your pipeline Infosys Information Platform (IIP) Analytics-as-a- Service Product and Consumer Analytics
  • 9. AWS Named as a Leader in The Forrester WaveTM: Big Data Warehouse Q2 2017 http://bit.ly/2w1TAEy On June 15, Forrester published the Big Data Warehouse, Q2 2017, in which AWS is positioned as a Leader. According to Forrester, “With more than 5,000 deployments, Amazon Redshift has the largest data warehouse deployments in the cloud.” AWS received the highest score possible, 5/5, for customer base, market awareness, ability to execute, road map, support, and partners. “AWS’s key strengths lie in its dynamic scale, automated administration, flexibility of database offerings, good security, and high availability (HA) capabilities, which make it a preferred choice for customers.
  • 12. NTT Docomo: Japan’s largest mobile service provider 68 million customers Tens of TBs per day of data across a mobile network 6 PB of total data (uncompressed) Data science for marketing operations, logistics, and so on Greenplum on-premises Scaling challenges Performance issues Need same level of security Need for a hybrid environment
  • 13. 125 node DS2.8XL cluster 4,500 vCPUs, 30 TB RAM 2 PB compressed 10x faster analytic queries 50% reduction in time for new BI application deployment Significantly less operations overhead Data Source ET AWS Direct Connect Client Forwarder LoaderState Management SandboxAmazon Redshift S3 NTT Docomo: Japan’s largest mobile service provider
  • 14. Nasdaq: powering 100 marketplaces in 50 countries Orders, quotes, trade executions, market “tick” data from 7 exchanges 7 billion rows/day Analyze market share, client activity, surveillance, billing, and so on Microsoft SQL Server on-premises Expensive legacy DW ($1.16 M/yr.) Limited capacity (1 yr. of data online) Needed lower TCO Must satisfy multiple security and regulatory requirements Similar performance
  • 15. 23 node DS2.8XL cluster 828 vCPUs, 5 TB RAM 368 TB compressed 2.7 T rows, 900 B derived 8 tables with 100 B rows 7 month migration ¼ the cost, 2x storage, room to grow Faster performance, very secure Nasdaq: powering 100 marketplaces in 50 countries
  • 17. Amazon Redshift Cluster Architecture Massively parallel, shared nothing architecture Streaming Backup/Restore from S3 Leader node • SQL endpoint • Stores metadata • Coordinates parallel SQL processing Compute nodes • Local, columnar storage • Executes queries in parallel • Load, backup, restore • 2, 16 or 32 slices Redshift Cluster JDBC/ODBC Leader Node Compute Nodes Efficient Data Loads Streaming Backup/Restore
  • 18. Amazon Redshift Cluster Architecture Leader node • SQL endpoint • Stores metadata • Coordinates parallel SQL processing Compute nodes • Local, columnar storage • Executes queries in parallel • Load, backup, restore • 2, 16 or 32 slices Redshift Spectrum • Query data at rest on S3 • Ultra high scale, increased concurrency • JSON, CSV, Parquet storage Redshift Cluster JDBC/ODBC ... 1 2 3 4 N Leader Node Compute Nodes Spectrum Fleet Amazon S3
  • 19. When should you add Spectrum? Your data will get bigger • On average, data warehousing volumes grow 10x every 5 years • The average Amazon Redshift customer doubles data each year Amazon Redshift Spectrum makes data analysis simpler • Access your data without ETL pipelines • Teams using Amazon EMR, Athena & Redshift can collaborate using the same data lake • Late binding views enable federated queries between internal & external tables Amazon Redshift Spectrum improves availability and concurrency • Run multiple Amazon Redshift clusters against common data • Isolate jobs with tight SLAs from ad hoc analysis
  • 20. Deploying your Data Warehouse on AWS Ingest Prepare for Analytics PresentStore AnalyzeStore
  • 21. Deploying your Data Warehouse on AWS Batch Firehose Glue S3 Streaming? DBs (OLTP)? Own code? Parallel? Managed? SCT Migration Agent DWH? Prepare for Analytics PresentStore AnalyzeStore
  • 22. Deploying your Data Warehouse on AWS Batch Firehose Glue S3 Streaming? DBs (OLTP)? Own code? Parallel? Managed? SCT Migration Agent DWH? Present Prepare for Analytics AnalyzeStore
  • 23. Deploying your Data Warehouse on AWS AWS Lambda Glue and/or and/or ETL? Managed? Complex? Cost? Batch Firehose Glue S3 Streaming? DBs (OLTP)? Own code? Parallel? Managed? SCT Migration Agent DWH? PresentAnalyzeStore
  • 24. Deploying your Data Warehouse on AWS AWS Lambda Glue and/or and/or ETL? Managed? Complex? Cost? Batch Firehose Glue S3 Streaming? DBs (OLTP)? Own code? Parallel? Managed? SCT Migration Agent DWH? S3 Query optimized & Ready for self- service PresentAnalyze
  • 25. Deploying your Data Warehouse on AWS AWS Lambda Glue and/or and/or ETL? Managed? Complex? Cost? Batch Firehose Glue S3 Streaming? DBs (OLTP)? Own code? Parallel? Managed? SCT Migration Agent DWH? S3 Athena Query Service Ad-hoc Analysis Redshift Spectrum Redshift Spectrum DWH and Data Marts Redshift Data Warehouse Redshift Data Warehouse Present Predictive Query optimized & Ready for self- service
  • 26. Deploying your Data Warehouse on AWS AWS Lambda Glue and/or and/or ETL? Managed? Complex? Cost? Batch Firehose Glue S3 Streaming? DBs (OLTP)? Own code? Parallel? Managed? SCT Migration Agent DWH? S3 Athena Query Service Ad-hoc Analysis BI & Visualization Redshift Spectrum Redshift Spectrum DWH and Data Marts Redshift Data Warehouse Redshift Data Warehouse Predictive Query optimized & Ready for self- service
  • 28. Business user Sign-in First analysis in about 60 seconds aws.amazon.com/quicksight
  • 29. Easy exploration of AWS data Securely discover and connect to AWS data Quickly explore AWS data sources Relational databases (Amazon RDS, Amazon RDS for Aurora, Amazon Redshift) NoSQL databases (Amazon DynamoDB) Amazon EMR, Amazon S3, files (CSV, Excel, TSV, XLF, CLF) Streaming data sources (Amazon DynamoDB, Amazon Kinesis) Easily import data from any table or file Automatic detection of data types
  • 30. Business User QuickSight API Data Prep Metadata SuggestionsConnectors SPICE Business User QuickSight UI Mobile Devices Web Browsers Partner BI products Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon EMR Amazon Redshift Amazon RDSFiles Third-party
  • 32. Data Ingest & Migrations
  • 36. Getting data to Redshift using AWS Database Migration Service (DMS) Simple to use Minimal Downtime Supports most widely used Databases Low Cost Fast & Easy to Set-up Reliable
  • 38. Optimizing Amazon Redshift with the AWS Schema Conversion Tool amzn.to/2sTYow1
  • 39. Extending your DWH (or Migrations) to Redshift http://amzn.to/2vN3UBO Oracle to Redshift
  • 40. Extending your DWH (or Migrations) to Redshift http://amzn.to/2wZy7OA Teradata to Redshift
  • 41. Extending your DWH (or Migrations) to Redshift http://amzn.to/2hbKwYd Converge Silos to Redshift
  • 42. Redshift Playbook Part 1: Preamble, Prerequisites, and Prioritization Part 2: Distribution Styles and Distribution Keys Part 3: Compound and Interleaved Sort Keys Part 4: Compression Encodings Part 5: Table Data Durability amzn.to/2quChdM
  • 43. ` Informatica for Amazon Redshift Davinder Mundy - Senior Pre-Sales Consultant
  • 44. Paths to Cloud Data Warehousing and Analytics Extend • Quickly meet business demands • More variety of data formats for analysis Migrate (‘Lift & Shift’) • Current warehouse not performing & need to scale • Reduce costs (platform & maintenance) Born in the cloud • Agile Self-Service Analytics • Highly Scalable • Elastic
  • 45. Informatica supports both ETL and ELT Patterns ETL (1, 2, 3) 1. Bulk Source Data Ingestion 2. Multi-part load into S3 of compressed files 3. Copy S3 data into Amazon Redshift Staging ELT (4, 5, 6) 1. SQL Pushdown for Amazon Redshift to Amazon Redshift Table Integrations within same cluster Redshift StagingAWS S3 Informatica Cloud/ PowerCenter 1 2 3 4 5 Redshift Intermediate Redshift Analytics 6 4 5 Same Redshift Cluster
  • 46. Optimized Data Ingestion into Amazon Redshift 1. Source Bulk Data Loader 2. Partitions - parallel data pipelines 3. Local staging files 4. S3 Parallel Upload 5. Copy Command to Redshift
  • 48. Fox Entertainment– Migrate to Amazon Redshift Goals: Universal Data warehouse across business units in different global regions; Scale and provide self service analytics at lower cost; Accelerate Journey to AWS Cloud Needs: Migrate from On-premise MPP Data; Benefits: • Repoint 6000 PC ETL mappings from Netezza to Redshift; Able to reuse existing Informatica workflows and migrate quickly to Redshift • Informatica SQL Pushdown (ELT) was able to transform and push millions of records every hour 24 x 7. Logs, Click Streams CSV, Social Feeds S3 Staging Tables Intermediate Tables Analysis Tables Oracle SaaS Migration to AWS Cloud; Reuse PowerCenter mappings
  • 49. Shaw Communications – Legacy Data Warehouse CLOUD ON-PREM
  • 50. Shaw Communications – Hybrid Data Warehouse CLOUD ON-PREM
  • 51. Shaw Communications – Hybrid Data Warehouse CLOUD ON-PREM Cloud
  • 52. Shaw Communications – Hybrid Data Warehouse CLOUD ON-PREM Cloud Hybrid pattern – Informatica Cloud, Amazon Redshift, Kinesis Streams
  • 53. Adaptive Biotechnologies – Born in the Cloud Goals: Acquisitions and growth propelled the need to create a DWH; Adhoc analytics for their data scientists Needs: Flexible and scalable DWH/ETL; data models constantly changing; easy to set up and manage; cost effective Benefits: • Self service made easy with Redshift and Informatica • Informatica gracefully handled HL7 and other B2B formats and helped transport it via SFTP to our collection partners Born in the Cloud! Build a modern Data Warehouse (Redshift) and ETL (Informatica Cloud) Cloud LIMS Bioinformatics Pipelines Customer Portal File s Legacy System s
  • 54. Amazon Redshift Connector Capabilities Robust Comprehensive High Performance Secure Flexible § Error management, Notifications, & Alerts § Auto-handle special characters § Dynamically create targets § Pre and Post SQL § SQL Overrides § S3 data retention policies § AWS Multi-Region support § Partitioning § SQL Pushdown § Optimized Lookups § Multi-part Upload & Download § Compression before S3 Upload § AWS KMS Support § IAM Roles § Client & Server Side Encryption § S3 VPC Endpoint § Secure Agent on premise § Informatica Hosted agent § Agent on AWS § Configurable S3 Copy Options § Dynamic S3 Buckets
  • 55. Informatica Products on AWS Power Center Informatica Cloud Big Data Management Enterprise Informatica Catalog Informatica Cloud Intelligent Data Lake Informatica Data Quality Enterprise Informatica Catalog Power Center Master Data Management Big Data Management Informatica Data Quality Certified Available
  • 56. Learn more….. Learn & Prepare • Cloud Analytics with Informatica Cloud & Amazon Redshift • PowerCenter on AWS • Data Lakes on AWS Get Started on AWS MarketplaceDeep-Dive
  • 57. AWS and Informatica Relationship Team Romain Roullet - AWS ISV Success Manager - EMEA https://www.linkedin.com/in/romainroullet/ Nitin Mathur - AWS Strategy & Business Development Leader - Global https://www.linkedin.com/in/nitmathur/ Andrew McIntyre - Informatica Strategy & Business Development Leader - Global https://www.linkedin.com/in/andrew-mcintyre-a6799765/ Ian Paton - Informatica UK Partnerships https://www.linkedin.com/in/ian-paton-%E2%98%81-6256837/
  • 58. AWS Data Warehouse Projects Created by: Nick Holmes October 2017
  • 59. 59 The KCOM Approach Consulting background, with Architect, DBA & DevOps resources MVP Design MVP Implement Test & optimise Iterate throughout the project lifecycle
  • 60. 60 Data Management Project • Volumes - 6 billion retail transactions and 60 million rows of customer viewing data • Platform - AWS Redshift Massively Parallel Processing (MPP) architecture • ETL - ingress of 200GB compressed data in 15 mins (2TB uncompressed) • Performance - data matched between two data sets in ~90 seconds (43 million matched rows)
  • 61. 61 Travel IndustryTicketing • IaC for all components to facilitate CI/CD (5 environments) & Immutable builds • IAM based permissions for Redshift • Bulk Load with DataPipeline • ETL Management with Step Functions • Aggregation transforms within RedShift • Schemas are generated to support each report type • Reports are generated on a daily basis & on demand • Encryption for data at rest within the system (KMS)
  • 62. 62 Ticketing Architecture 2.2 Million rows per day 28 Day Moving Window analysis dataset 79 Different reports produced
  • 63. 63 Upgrades & Challenges • Impact of Single AZ nature of Redshift • Limited processing window • Large volumes of data • Dynamic rulesets • Data privacy • Redshift Spectrum • AWS Glue • Continuing schema optimisation • “AWS Toybox”
  • 64. Thank You Nick Holmes nick.holmes@kcom.com http://www.kcom.com/connected-thinking/?filter=AWS @KCOMbusiness