SlideShare uma empresa Scribd logo
1 de 74
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Pop-up Loft
Getting Started with Amazon Redshift
Maor Kleider
Sr. Product Manager, Amazon Redshift
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Agenda
• Introduction
• Benefits
• Use cases
• Getting started
• Q&A
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Generate
Collect	&	Store
Analyze
Collaborate	&	Act
Individual	AWS	customers
generate	over	a	PB/day
It’s never been easier to generate vast amounts of data
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Generate
Collect	&	Store
Analyze
Collaborate	&	Act
Amazon S3 lets you collect and store all this data
Store	exabytes	of
data	in	S3
Individual	AWS	customers
generating	over	PB/day
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Generate
Collect	&	Store
Analyze
Collaborate	&	Act
Individual	AWS	customers
generating	over	PB/day
Highly
Constrained
But how do you analyze it?
Store	exabytes	of
data	in	S3
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
1990 2000 2010 2020
Generated	Data
Available	for	Analysis
Sources:	
Gartner:	User	Survey	Analysis:	Key	Trends	Shaping	the	Future	of	Data	Center	Infrastructure	Through	2011	
IDC:	Worldwide	Business	Analytics	Software	2012–2016	Forecast	and	2011	Vendor	Shares	
Data	Volume
Year
The Dark Data Problem
Most generated data is unavailable for analysis
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
The Emerging Analytics Architecture
AWS	Glue
ETL	&	Data	Catalog
Storage
Serverless
Compute
Data
Processing
Amazon	S3	
Exabyte-scale	Object	Storage
Amazon	Kinesis	Firehose
Real-Time Data	Streaming
Amazon	EMR
Managed	Hadoop	Applications
AWS	Lambda
Trigger-based	Code	Execution
AWS	Glue	Data	Catalog
Hive-compatible	Metastore
Amazon	Redshift	Spectrum
Fast	@	Exabyte	scale
Amazon	Redshift
Petabyte-scale	Data	Warehousing
Amazon	Athena
Interactive	Query
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Relational	data	warehouse
Massively	parallel
Fully	managed
HDD	and	SSD	platforms
$1,000/TB/year;	starts	at	$0.25/hour
Amazon	
Redshift
a lot	faster
a	lot	simpler
a	lot	cheaper
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Selected	Amazon Redshift	Customers
This image cannot currently be displayed.
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Use Case: Traditional Data Warehousing
Business
Reporting
Advanced pipelines
and queries
Secure and
Compliant
Easy Migration – Point & Click using AWS Database Migration Service
Secure & Compliant – End-to-End Encryption. SOC 1/2/3, PCI-DSS, HIPAA and FedRAMP compliant
Large Ecosystem – Variety of cloud and on-premises BI and ETL tools
Japanese	Mobile	Phone	
Provider
Powering	100	marketplaces	in	
50	countries
World’s	Largest	Children’s	
Book	Publisher
Bulk Loads
and Updates
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Use Case: Log Analysis
Log & Machine
IOT Data
Clickstream
Events Data
Time-Series
Data
Cheap – Analyze large volumes of data cost-effectively
Fast – Massively Parallel Processing (MPP) and columnar architecture for fast queries and parallel loads
Near real-time – Micro-batch loading and Amazon Kinesis Firehose for near-real time analytics
Interactive	data	analysis	and	
recommendation	engine
Ride	analytics	for	pricing	and	
product	development
Ad	prediction	and	on-
demand	analytics
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Use Case: Business Applications
Multi-Tenant BI
Applications
Back-end
services
Analytics as a
Service
Fully Managed – Provisioning, backups, upgrades, security, compression all come built-in so you can
focus on your business applications
Ease of Chargeback – Pay as you go, add clusters as needed. A few big common clusters & data marts
Service Oriented Architecture – Integrated with other AWS services. Easy to plug into your pipeline
Infosys	Information	Platform	
(IIP)	
Analytics-as-a-
Service
Product	and	Consumer	
Analytics
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Amazon	Redshift	architecture
• Leader	node
Simple SQL endpoint
Stores metadata
Optimizes query plan
Coordinates query execution
• Compute	nodes
Local columnar storage
Parallel/distributed execution of all queries, loads,
backups, restores, resizes
• Start at just $0.25/hour, grow to 2 PB
(compressed)
DC1: SSD; scale from 160 GB to 326 TB
DS2: HDD; scale from 2 TB to 2 PB
Leader node
Compute node
10 GigE
(HPC)
Ingestion
Backup
Restore
BI tools SQL clientsAnalytics tools
Compute node Compute node
JDBC/ODBC
Amazon S3 Amazon EMR Amazon Dynamo DB SSH
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Benefit #1: Amazon Redshift is fast
• Dramatically less I/O
Column storage
Data compression
Zone maps
Direct-attached storage
Large data block sizes
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
10	|	13	|	14	|	26	|…
…	|	100	|	245	|	324
375	|	393	|	417…
…	512	|	549	|	623
637	|	712	|	809	…
…	|	834	|	921	|	959
10
324
375
623
637
959
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Benefit #2: Amazon Redshift is inexpensive
DS2	(HDD)
Price	per	hour	for	
DS2.XL single	node
Effective	annual
price	per	TB	compressed
On-demand $	0.850 $	3,725
1	year reservation $	0.500 $	2,190
3	year reservation $	0.228 $				999
DC1	(SSD)
Price	per	hour	for	
DC1.L single	node
Effective	annual
price	per	TB	compressed
On-demand $	0.250 $	13,690
1	year reservation $	0.161 $			8,795
3	year reservation $	0.100 $			5,500
Pricing is simple
Number of nodes x price/hour
No charge for leader node
No upfront costs
Pay as you go
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Benefit #3: Amazon Redshift is fully managed
Continuous/incremental backups
Multiple copies within cluster
Continuous and incremental backups
to Amazon S3
Continuous and incremental backups
across regions
Streaming restore
Compute node
Amazon S3
Region	1
Region	2
Compute node Compute node
Amazon S3
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Leader node
Compute node
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal VPC
BI tools SQL clientsAnalytics tools
Compute node Compute node
JDBC/ODBC
® Load encrypted from S3
® SSL to secure data in transit
® ECDHE perfect forward secrecy
® Amazon VPC for network isolation
® Encryption to secure data at rest
® All blocks on disks and in S3 encrypted
® Block key, cluster key, master key (AES-256)
® On-premises HSM & AWS CloudHSM support
® Audit logging and AWS CloudTrail integration
® SOC 1/2/3, PCI-DSS, FedRAMP, BAA
Benefit	#4:	Security	is	built-in
Amazon S3 Amazon EMR Amazon Dynamo DB SSH
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Benefit #5: Amazon Redshift is powerful
• Approximate functions
• User defined functions
• Machine learning
• Data science
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Benefit #6: Amazon Redshift has a large ecosystem
Data integration Systems integratorsBusiness intelligence
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Amazon Redshift Spectrum
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Amazon Redshift Spectrum
Run SQL queries directly against data in S3 using thousands of nodes
Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query
High concurrency: Multiple
clusters access same data
No ETL: Query data in-place using
open file formats
Full Amazon Redshift
SQL support
S3
SQL
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Amazon	Redshift	
Query	S3	directly	or	join	data	
across	Redshift	and	S3
Support	for	CSV,	Parquet,	ORC,	Grok,	
Avro	and	more	formats
Scale	Redshift	compute	and
storage	separately	
Redshift	Query	
Engine
Redshift	Data	
Data	Lake
Amazon S3	
Extend Redshift Queries To Amazon S3
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Query
SELECT	COUNT(*)
FROM S3.EXT_TABLE
GROUP	BY	…
Life	of	a	query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
1
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Life	of	a	query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
Query	is	optimized	and	compiled	at	the	
leader	node.	Determine	what	gets	run	
locally	and	what	goes	to	Amazon	Redshift	
Spectrum
2
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Life	of	a	query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
Query	plan	is	sent	to	all	
compute	nodes3
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Life	of	a	query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
Compute	nodes	obtain	partition	info	from	
Data	Catalog;	dynamically	prune	partitions	4
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Life	of	a	query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
Each	compute	node	issues	multiple	
requests	to	the	Amazon	Redshift	
Spectrum	layer
5
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Life	of	a	query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
Amazon	Redshift	Spectrum	nodes	
scan	your	S3	data	
6
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Life	of	a	query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
7
Amazon	Redshift	
Spectrum	projects,	
filters,	joins	and	
aggregates
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Life	of	a	query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
Final	aggregations	and	joins	with	
local	Amazon	Redshift	tables	done	
in-cluster
8
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Life	of	a	query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
Result	is	sent	back	to	client9
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Running an analytic query over an exabyte in S3
SELECT
P.ASIN,
P.TITLE,
R.POSTAL_CODE,
P.RELEASE_DATE,
SUM(D.QUANTITY * D.OUR_PRICE) AS SALES_sum
FROM
s3.d_customer_order_item_details D,
asin_attributes A,
products P,
regions R
WHERE
D.ASIN = P.ASIN AND
P.ASIN = A.ASIN AND
D.REGION_ID = R.REGION_ID AND
A.EDITION LIKE '%FIRST%' AND
P.TITLE LIKE '%Potter%' AND
P.AUTHOR = 'J. K. Rowling' AND
R.COUNTRY_CODE = ‘US’ AND
R.CITY = ‘Seattle’ AND
R.STATE = ‘WA’ AND
D.ORDER_DAY :: DATE >= P.RELEASE_DATE AND
D.ORDER_DAY :: DATE < dateadd(day, 3, P.RELEASE_DATE)
GROUP BY P.ASIN, P.TITLE, R.POSTAL_CODE, P.RELEASE_DATE
ORDER BY SALES_sum DESC
LIMIT 20;
• Roughly	140	TB	of	customer	item	order	
detail	records	for	each	day	over	past	20	
years;
• 190	million	files	across	15,000	partitions	in	
S3.	One	partition	per	day	for	USA	and	rest	
of	world;
• Total	data	size	over	an	exabyte
Optimization:
• Compression	……………..….……..5X
• Columnar	file	format……….......…10X
• Scanning	with	2500	nodes…....2500X
• Static	partition	elimination…............2X
• Dynamic	partition	elimination..….350X
• Redshift’s	query	optimizer……......40X
Hive	(1000	node	cluster) Redshift Spectrum
5	years 155s
*	Estimated	using	20	node	Hive	cluster	&	1.4TB,	assume	linear
*	Query	used	a	20	node	DC1.8XLarge	Amazon	Redshift	cluster
*	Not	actual	sales	data	- generated	for	this	demo	based	on	data	
format	used	by	Amazon	Retail.
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Use cases
• Extend data warehousing to data lake in Amazon S3
• Cost reduction for less frequently accessed data
• Better query performance and minimal time to insight at large scale
• Query many open formats and large data sets that can’t be loaded into cluster
• Improved concurrency with multiple Redshift clusters querying common data
• Elastically scale compute resources separately from the storage layer in S3
• Use familiar SQL to cleanse, transform and load data from S3 to Redshift
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Multiple cluster architecture
Use multiple
Redshift clusters
to query same
copy of data in s3
Query
SELECT	COUNT(*)
FROM S3.EXT_TABLE
GROUP	BY	…
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Data Catalog
Apache Hive Metastore
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Simplify and accelerate your ETL pipelines
“Dirty”
Logs
Clean
Logs
CSV
Amazon	Redshift
COPY
“Dirty”
Logs
CREATE TABLE AS
SELECT C.. FROM S3.xxx WHERE …
Amazon	Redshift
Staging	Table Final	Table
Amazon	Redshift
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Customer Use Cases
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Nasdaq: powering 100 marketplaces in 50 countries
Orders, quotes, trade executions,
market “tick” data from 7 exchanges
7 billion rows/day
Analyze market share, client activity,
surveillance, billing, and so on
Microsoft SQL Server on-premises
Expensive	legacy	DW	
($1.16 M/yr.)
Limited	capacity	(1	yr.	of	data	
online)
Needed	lower	TCO
Must	satisfy	multiple	security	and	
regulatory	requirements
Similar	performance
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
23 node DS2.8XL cluster
828 vCPUs, 5 TB RAM
368 TB compressed
2.7 T rows, 900 B derived
8 tables with 100 B rows
7 man-month migration
¼ the cost, 2x storage, room to grow
Faster performance, very secure
Nasdaq: powering 100 marketplaces in 50 countries
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Amazon.com clickstream analytics
• Web	log	analysis	for	Amazon.com
– PBs	workload,	2TB/day@67%	YoY
– Largest	table:	400	TB
• Understand	customer	behavior
• Previous	solution
– Legacy	DW	(Oracle)—query	across	1	week/hr
– Hadoop—query	across	1	month/hr
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Results with Amazon Redshift
• Query	15	months	in	14	min
• Load	5B	rows	in	10	min
• 21B	w/	10B	rows:	3 days	to	2	hrs
(Hive	à Redshift)
• Load	pipeline:	90	hrs to	8	hrs
(Oracle	à Redshift)
• 100	node	DS2.8XL	clusters	
• Easy	resizing
• Managed	backups	and	restore
• Failure	tolerance	and	recovery
• 20%	time	of	one	DBA
• Increased	productivity
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
“Redshift Spectrum	will	let	us	expand	the	universe	of	the	
data	we	analyze	to	100s	of	petabytes	over	time.	This	is	truly	
a	game	changer,	and	we	can	think	of	no	other	system	in	the	
world	that	can	get	us	there.”	
“Multiple	teams	can	now	query	the	same	Amazon	S3	
data	sets	using	both	Redshift	and	EMR.”
“Redshift	Spectrum	will	help	us	scale	yet	
further	while	also	lowering	our	costs.”	
“Redshift	Spectrum’s	fast	performance	across	
massive	data	sets	is	unprecedented.”	
“Redshift	Spectrum	enables	us	to	directly	operate	on	
our	data	in	its	native	format	in	Amazon	S3	with	no	
preprocessing	or	transformation.”
“Our	data	science	team	using	Amazon	EMR	can	now	
collaborate	with	our	marketing	and	product	teams	
using	Redshift	Spectrum	to	analyze	the	same	Amazon	
S3	data	sets.”
Customers love Amazon Redshift Spectrum
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Getting Started
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Provisioning
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Enter cluster details
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Select node configuration
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Select security settings and provision
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Point-and-click resize
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Data Modeling
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
SELECT	COUNT(*)	FROM	LOGS	WHERE	DATE	=	‘09-JUNE-2013’
MIN:	01-JUNE-2013
MAX:	20-JUNE-2013	
MIN:	08-JUNE-2013
MAX:	30-JUNE-2013	
MIN:	12-JUNE-2013
MAX:	20-JUNE-2013	
MIN:	02-JUNE-2013
MAX:	25-JUNE-2013	
Unsorted	table
MIN:	01-JUNE-2013
MAX:	06-JUNE-2013	
MIN:	07-JUNE-2013
MAX:	12-JUNE-2013	
MIN:	13-JUNE-2013
MAX:	18-JUNE-2013	
MIN:	19-JUNE-2013
MAX:	24-JUNE-2013	
Sorted	by	date
Zone maps
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
• Single	column
• Compound
• Interleaved
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Single Column
• Table is sorted by 1 column
Date Region Country
2-JUN-2015 Oceania New Zealand
2-JUN-2015 Asia Singapore
2-JUN-2015 Africa Zaire
2-JUN-2015 Asia Hong	Kong
3-JUN-2015 Europe Germany
3-JUN-2015 Asia Korea
[ SORTKEY ( date ) ]
• Best for:
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group bys
• Quickest to VACUUM
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Compound
• Table is sorted by 1st column , then 2nd column etc.
Date Region Country
2-JUN-2015 Africa Zaire
2-JUN-2015 Asia Korea
2-JUN-2015 Asia Singapore
2-JUN-2015 Europe Germany
3-JUN-2015 Asia Hong Kong
3-JUN-2015 Asia Korea
[ SORTKEY COMPOUND ( date, region, country) ]
• Best for:
• Queries that use 1st column as primary filter, then other cols
• Can speed up joins and group bys
• Slower to VACUUM
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
• KEY
• ALL
• EVEN
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter	Black
446 M Pat	Partridge
658 F Sarah	Cyan
164 M Brian	Snail
209 M James	White
306 F Lisa	Green
2
3
4
ID Gender Name
101 M John Smith
306 F Lisa	Green
ID Gender Name
292 F Jane Jones
209 M James	White
ID Gender Name
139 M Peter	Black
164 M Brian	Snail
ID Gender Name
446 M Pat	Partridge
658 F Sarah	Cyan
EVEN
DISTSTYLE EVEN
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter	Black
446 M Pat	Partridge
658 F Sarah	Cyan
164 M Brian	Snail
209 M James	White
306 F Lisa	Green
KEY
ID Gender Name
101 M John Smith
306 F Lisa Green
ID Gender Name
292 F Jane Jones
209 M James	White
ID Gender Name
139 M Peter	Black
164 M Brian	Snail
ID Gender Name
446 M Pat	Partridge
658 F Sarah	Cyan
DISTSTYLE KEY
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter	Black
446 M Pat	Partridge
658 F Sarah	Cyan
164 M Brian	Snail
209 M James	White
306 F Lisa Green
KEY
ID Gender Name
101 M John Smith
139 M Peter	Black
446 M Pat	Partridge
164 M Brian	Snail
209 M James White
ID Gender Name
292 F Jane Jones
658 F Sarah	Cyan
306 F Lisa Green
DISTSTYLE KEY
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter	Black
446 M Pat	Partridge
658 F Sarah	Cyan
164 M Brian	Snail
209 M James	White
306 F Lisa	Green
101 M John Smith
292 F Jane Jones
139 M Peter	Black
446 M Pat	Partridge
658 F Sarah	Cyan
164 M Brian	Snail
209 M Lisa	Green
306 F James	White
101 M John Smith
292 F Jane Jones
139 M Peter	Black
446 M Pat	Partridge
658 F Sarah	Cyan
164 M Brian	Snail
209 M Lisa	Green
306 F James	White
101 M John Smith
292 F Jane Jones
139 M Peter	Black
446 M Pat	Partridge
658 F Sarah	Cyan
164 M Brian	Snail
209 M Lisa	Green
306 F James	White
101 M John Smith
292 F Jane Jones
139 M Peter	Black
446 M Pat	Partridge
658 F Sarah	Cyan
164 M Brian	Snail
209 M Lisa	Green
306 F James	White
ALL
DISTSTYLE ALL
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
• KEY
• Large Fact tables
• Large dimension tables
• ALL
• Medium dimension tables (1K – 2M)
• Small	dimension	tables
• EVEN
• Tables with no joins or group by
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Loading Data
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
AWS CloudCorporate Data center
Amazon S3
Amazon
Redshift
Flat files
Data loading options
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
AWS CloudCorporate Data center
ETL
Source DBs
Amazon
Redshift
Amazon
Redshift
Data loading options
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
AWS Cloud
Amazon
Redshift
Amazon
Kinesis
Data loading options
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
ETL data into your data warehouse
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Instantly query your data lake on Amazon S3
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Redshift Spectrum: Defining External Schema and External Tables
1. Define an external schema in Amazon Redshift using the AWS Glue Data
Catalog or your own Apache Hive Metastore
CREATE	EXTERNAL	SCHEMA	<schema_name>
FROM	{	[	DATA	CATALOG	]	|	HIVE	METASTORE	}	
DATABASE	'database_name'	
IAM_ROLE	'iam-role-arn'
2. Register external tables using Amazon Athena, your Hive Metastore
client, or from Amazon Redshift
CREATE	EXTERNAL	TABLE	<schema_name>.<table_name>
[PARTITIONED	BY	<column_name,	data_type,	…>]
STORED	AS	file_format
LOCATION	s3_location
[TABLE	PROPERTIES	property_name=property_value,	…];
3. Query external tables
SELECT	…	FROM	<schema_name>.<table_name>	…
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Querying
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
JDBC/ODBC
Amazon Redshift
Amazon Redshift works with your existing BI tools
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
ODBC/JDBC
BI Clients
Amazon Redshift
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
ODBC/JDBC
BI Server Amazon Redshift
Clients
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
View explain plans
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Monitor query performance
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Resources
• Detail Pages
– http://aws.amazon.com/redshift
– https://aws.amazon.com/redshift/spectrum
– https://aws.amazon.com/marketplace/redshift/
– https://aws.amazon.com/redshift/developer-resources/
– Amazon Redshift Utilities - GitHub
• Best Practices
– http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html
– http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html
– http://docs.aws.amazon.com/redshift/latest/dg/c-optimizing-query-performance.html
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Thank you!
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Pop-up Loft
aws.amazon.com/activate
Everything and Anything Startups
Need to Get Started on AWS

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelis
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Amazon DynamoDB and DAX
Amazon DynamoDB and DAXAmazon DynamoDB and DAX
Amazon DynamoDB and DAX
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Relational Database Services on AWS
Relational Database Services on AWSRelational Database Services on AWS
Relational Database Services on AWS
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
SQL Server on AWS
SQL Server on AWSSQL Server on AWS
SQL Server on AWS
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 
Lessons from Migrating Oracle Databases to Amazon RDS or Amazon Aurora
Lessons from Migrating Oracle Databases to Amazon RDS or Amazon Aurora Lessons from Migrating Oracle Databases to Amazon RDS or Amazon Aurora
Lessons from Migrating Oracle Databases to Amazon RDS or Amazon Aurora
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
SQL Server on AWS
SQL Server on AWSSQL Server on AWS
SQL Server on AWS
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 

Semelhante a Getting Started with Amazon Redshift

Moving your commercial databases to Amazon RDS
Moving your commercial databases to Amazon RDSMoving your commercial databases to Amazon RDS
Moving your commercial databases to Amazon RDS
Amazon Web Services
 

Semelhante a Getting Started with Amazon Redshift (20)

ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
GPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data WarehouseGPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data Warehouse
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
What’s New in Amazon RDS for Open-Source and Commercial Databases:
What’s New in Amazon RDS for Open-Source and Commercial Databases: What’s New in Amazon RDS for Open-Source and Commercial Databases:
What’s New in Amazon RDS for Open-Source and Commercial Databases:
 
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDS
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDSDAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDS
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDS
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
Running Oracle Databases on Amazon RDS - DAT313 - re:Invent 2017
Running Oracle Databases on Amazon RDS - DAT313 - re:Invent 2017Running Oracle Databases on Amazon RDS - DAT313 - re:Invent 2017
Running Oracle Databases on Amazon RDS - DAT313 - re:Invent 2017
 
Amazon Aurora (MySQL, Postgres)
Amazon Aurora (MySQL, Postgres)Amazon Aurora (MySQL, Postgres)
Amazon Aurora (MySQL, Postgres)
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
 
Migrating to Amazon RDS with Database Migration Service:
Migrating to Amazon RDS with Database Migration Service:Migrating to Amazon RDS with Database Migration Service:
Migrating to Amazon RDS with Database Migration Service:
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Moving your commercial databases to Amazon RDS
Moving your commercial databases to Amazon RDSMoving your commercial databases to Amazon RDS
Moving your commercial databases to Amazon RDS
 
STG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data WorkloadsSTG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data Workloads
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data Oceans
 
Self-Service Analytics with AWS Big Data and Tableau - ARC217 - re:Invent 2017
Self-Service Analytics with AWS Big Data and Tableau - ARC217 - re:Invent 2017Self-Service Analytics with AWS Big Data and Tableau - ARC217 - re:Invent 2017
Self-Service Analytics with AWS Big Data and Tableau - ARC217 - re:Invent 2017
 
Modernizing Databases with DMS
Modernizing Databases with DMSModernizing Databases with DMS
Modernizing Databases with DMS
 
Oracle on AWS
Oracle on AWSOracle on AWS
Oracle on AWS
 
DAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the CloudDAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the Cloud
 
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
 

Mais de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Getting Started with Amazon Redshift