SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
Amazon Redshift
Intro, Details
Ianni Vamvadelis
Solutions Architect
Amazon DynamoDB
Fast,	
  Predictable,	
  Highly-­‐Scalable	
  NoSQL	
  Data	
  Store	
  
Amazon RDS
Managed	
  Rela=onal	
  Database	
  Service	
  for	
  
MySQL,	
  Oracle	
  and	
  SQL	
  Server	
  
Amazon ElastiCache
In-­‐Memory	
  Caching	
  Service	
  
Amazon Redshift
Fast,	
  Powerful,	
  Fully	
  Managed,	
  Petabyte-­‐Scale	
  
Data	
  Warehouse	
  Service	
  
Compute Storage
AWS Global Infrastructure
Database
Application Services
Deployment & Administration
Networking
AWS Database
Services
Scalable High Performance
Application Storage in the Cloud
Amazon DynamoDB
Fast,	
  Predictable,	
  Highly-­‐Scalable	
  NoSQL	
  Data	
  Store	
  
Amazon RDS
Managed	
  Rela=onal	
  Database	
  Service	
  for	
  
MySQL,	
  Oracle	
  and	
  SQL	
  Server	
  
Amazon ElastiCache
In-­‐Memory	
  Caching	
  Service	
  
Amazon Redshift
Fast,	
  Powerful,	
  Fully	
  Managed,	
  Petabyte-­‐Scale	
  
Data	
  Warehouse	
  Service	
  
Compute Storage
AWS Global Infrastructure
Database
Application Services
Deployment & Administration
Networking
AWS	
  Database	
  
Services	
  
Scalable High Performance
Application Storage in the Cloud
Design	
  Objec=ves	
  
A	
  petabyte-­‐scale	
  data	
  warehouse	
  service	
  that	
  was…	
  
Amazon	
  
RedshiL	
  
A Whole Lot Simpler
A Lot Cheaper
A Lot Faster
RedshiL	
  Drama=cally	
  Reduces	
  I/O	
  
•  Direct-­‐aNached	
  storage	
  
•  Large	
  data	
  block	
  sizes	
  
•  Columnar	
  storage	
  
•  Data	
  compression	
  
•  Zone	
  maps	
  
Id Age State
123 20 CA
345 25 WA
678 40 FL
Row storage Column storage
16GB RAM
2TB disk
2 cores
RedshiL	
  Runs	
  on	
  Op=mized	
  Hardware	
  
•  Op=mized	
  for	
  I/O	
  intensive	
  workloads	
  
•  HS1.8XL	
  available	
  on	
  Amazon	
  EC2	
  
•  Runs	
  in	
  HPC	
  -­‐	
  fast	
  network	
  
•  High	
  disk	
  density	
  
HS1.8XL: 128GB RAM, 16 Cores, 24 Spindles, 16TB Storage, 2GB/sec scan rate
HS1.XL: 16GB RAM, 2 Cores, 3 Spindles, 2TB Storage
16GB RAM
2TB disk
2 cores
16GB RAM
2TB disk
2 cores
16GB RAM
2TB disk
2 cores
16GB RAM
2TB disk
2 cores
16GB RAM
2TB disk
2 cores
16GB RAM
2TB disk
2 cores
16GB RAM
2TB disk
2 cores
16GB RAM
2TB disk
2 cores
Click to grow …
to 1.6PB
RedshiL	
  Parallelizes	
  and	
  Distributes	
  Everything	
  
Load	
  
Query	
  
Resize	
  
Backup	
  
Restore	
  
10	
  GigE	
  
(HPC)	
  
Inges=on	
  
Backup	
  
Restore	
  
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3
JDBC/ODBC	
  
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
Point	
  and	
  Click	
  Resize	
  
SQL Clients/BI Tools
128GB RAM
48TB disk
16 cores
Compute
Node
128GB RAM
48TB disk
16 cores
Compute
Node
128GB RAM
48TB disk
16 cores
Compute
Node
128GB RAM
48TB disk
16 cores
Leader
Node
Resize	
  your	
  cluster	
  while	
  remaining	
  online	
  
128GB RAM
48TB disk
16 cores
Compute
Node
128GB RAM
48TB disk
16 cores
Compute
Node
128GB RAM
48TB disk
16 cores
Compute
Node
128GB RAM
48TB disk
16 cores
Compute
Node
128GB RAM
48TB disk
16 cores
Leader
Node
New	
  target	
  provisioned	
  in	
  the	
  background	
  
Only	
  charged	
  for	
  source	
  cluster	
  
Resize	
  your	
  cluster	
  while	
  remaining	
  online	
  
•  Fully	
  automated	
  
– Data	
  automa=cally	
  redistributed	
  
•  Read	
  only	
  mode	
  during	
  resize	
  
•  Parallel	
  node-­‐to-­‐node	
  data	
  copy	
  
•  Automa=c	
  DNS-­‐based	
  endpoint	
  
cut-­‐over	
  
•  Only	
  charged	
  for	
  one	
  cluster	
  
SQL Clients/BI Tools
128GB RAM
48TB disk
16 cores
Compute
Node
128GB RAM
48TB disk
16 cores
Compute
Node
128GB RAM
48TB disk
16 cores
Compute
Node
128GB RAM
48TB disk
16 cores
Compute
Node
128GB RAM
48TB disk
16 cores
Leader
Node
Amazon	
  RedshiL	
  has	
  security	
  built-­‐in	
  
•  SSL	
  to	
  secure	
  data	
  in	
  transit	
  
•  Encryp=on	
  to	
  secure	
  data	
  at	
  rest	
  
– AES-­‐256	
  
– All	
  blocks	
  on	
  disks	
  and	
  in	
  Amazon	
  S3	
  
encrypted	
  
•  No	
  direct	
  access	
  to	
  compute	
  nodes	
  
•  Amazon	
  VPC	
  support	
  
10	
  GigE	
  
(HPC)	
  
Inges=on	
  
Backup	
  
Restore	
  
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Amazon S3
Customer	
  VPC	
  
Internal	
  
VPC	
  
JDBC/ODBC	
  
Leader
Node
Compute
Node
Compute
Node
Compute
Node
Con=nuous	
  Backup,	
  Automated	
  Recovery	
  
•  Replica=on	
  within	
  the	
  cluster	
  and	
  backup	
  to	
  Amazon	
  S3	
  to	
  
maintain	
  mul=ple	
  copies	
  of	
  data	
  at	
  all	
  =mes	
  
•  Backups	
  to	
  Amazon	
  S3	
  are	
  con=nuous,	
  automa=c,	
  and	
  
incremental	
  
•  Con=nuous	
  monitoring	
  and	
  automated	
  recovery	
  from	
  failures	
  of	
  
drives	
  and	
  nodes	
  
•  Able	
  to	
  restore	
  snapshots	
  to	
  any	
  Availability	
  Zone	
  within	
  a	
  region	
  
datavolume
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
data available
for analysis
data generated
Gap
cost	
  +	
  
effort	
  
RedshiL	
  is	
  Priced	
  to	
  Analyze	
  All	
  Your	
  Data	
  
$0.85 per hour for on-demand (2TB)
$999 per TB per year (3-yr reservation)
Integrates	
  With	
  Exis=ng	
  BI	
  Tools	
  
Amazon Redshift
JDBC/ODBC	
  
	
  
	
  
	
  
Scenarios
6
Repor=ng	
  Warehouse	
  
•  Accelerated	
  opera=onal	
  repor=ng	
  
•  Support	
  for	
  short-­‐=me	
  use	
  cases	
  
•  Data	
  compression,	
  index	
  redundancy	
  
RDBMS
Redshift
OLTP
ERP Reporting
and BI	
  
Data
Integration
Partners*
On-­‐Premises	
  Integra=on	
  
RDBMS
Redshift
OLTP
ERP Reporting
and BI	
  
Live	
  Archive	
  for	
  (Structured)	
  Big	
  Data	
  
•  Direct	
  integra=on	
  with	
  copy	
  command	
  
•  High	
  velocity	
  data	
  	
  
•  Data	
  ages	
  into	
  RedshiL	
  
•  Low	
  cost,	
  high	
  scale	
  op=on	
  for	
  new	
  apps	
  
DynamoDB
Redshift
OLTP
Web Apps Reporting
and BI	
  
Cloud	
  ETL	
  for	
  Big	
  Data	
  
•  Maintain	
  online	
  SQL	
  access	
  to	
  historical	
  logs	
  
•  Transforma=on	
  and	
  enrichment	
  with	
  EMR	
  
•  Longer	
  history	
  ensures	
  beNer	
  insight	
  
Redshift
Reporting
and BI	
  Elastic MapReduce
S3
Ingestion – Best Practices
§  Goal:	
  Leverage	
  all	
  the	
  compute	
  nodes	
  and	
  minimize	
  overhead	
  
§  Best	
  Prac=ces	
  
§  Preferred	
  method	
  -­‐	
  COPY	
  from	
  S3	
  
§  Loads	
  data	
  in	
  sorted	
  order	
  through	
  the	
  compute	
  nodes	
  
§  Single	
  Copy	
  command,	
  Split	
  data	
  into	
  mul=ple	
  files	
  
§  Strongly	
  recommend	
  that	
  you	
  gzip	
  large	
  datasets	
  
§  If	
  you	
  must	
  ingest	
  through	
  SQL	
  
§  Mul=-­‐row	
  inserts	
  
§  Avoid	
  large	
  number	
  of	
  singleton	
  
	
  insert/update/delete	
  opera=ons	
  
	
  
§  To	
  copy	
  from	
  another	
  table	
  
§  CREATE	
  TABLE	
  AS	
  or	
  INSERT	
  INTO	
  SELECT	
  
insert into category_stage values!
(default, default, default, default),!
(20, default, 'Country', default),!
(21, 'Concerts', 'Rock', default);!
copy time from 's3://mybucket/data/timerows.gz’ credentials
'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-
Key>’ gzip delimiter '|’;!
Choose a Sort key
§  Goal	
  
§  Skip	
  over	
  data	
  blocks	
  to	
  minimize	
  IO	
  
§  Best	
  Prac=ce	
  
§  Sort	
  based	
  on	
  range	
  or	
  equality	
  predicate	
  (WHERE	
  clause)	
  
§  If	
  you	
  access	
  recent	
  data	
  frequently,	
  sort	
  based	
  on	
  TIMESTAMP	
  
Choose a Distribution Key
§  Goal	
  
§  Distribute	
  data	
  evenly	
  across	
  nodes	
  	
  
§  Minimize	
  data	
  movement	
  among	
  nodes	
  :	
  Co-­‐located	
  Joins	
  and	
  Co-­‐located	
  Aggregates	
  
§  Best	
  Prac=ce	
  
§  Consider	
  using	
  Join	
  key	
  as	
  distribu=on	
  key	
  (JOIN	
  clause)	
  
§  If	
  mul=ple	
  joins,	
  use	
  the	
  foreign	
  key	
  of	
  the	
  largest	
  dimension	
  as	
  distribu=on	
  key	
  
§  Consider	
  using	
  Group	
  By	
  column	
  as	
  distribu=on	
  key	
  (GROUP	
  BY	
  clause)	
  
§  Avoid	
  
§  Keys	
  used	
  as	
  equality	
  filter	
  as	
  your	
  distribu=on	
  key	
  
§  If	
  de-­‐normalized	
  tables	
  and	
  no	
  aggregates,	
  do	
  not	
  specify	
  a	
  distribu=on	
  key	
  -­‐RedshiL	
  will	
  
use	
  round	
  robin	
  
Select  sum( S.Price * S.Quantity )!
FROM SALES S!
JOIN CATEGORY C   ON C.ProductId = S.ProductId!
JOIN  FRANCHISE  F ON F.FranchiseId = S.FranchiseId!
Where C.CategoryId = ‘Produce’  And  F.State = ‘WA’!
AND S.Date Between ‘1/1/2013’  AND ‘1/31/2013’!
Example
Dist key (C) = ProductID	

Sort key (S) = Date	

-- Total Produce sold in Washington in January 2013
Dist key (F) = FranchiseID	

Dist key (S) = ProductID
Workload Manager
§  Allows	
  you	
  to	
  manage	
  and	
  adjust	
  query	
  concurrency	
  
§  WLM	
  	
  allows	
  you	
  to	
  
§  Increase	
  query	
  concurrency	
  up	
  to	
  15	
  
§  Define	
  user	
  groups	
  and	
  query	
  groups	
  
§  Segregate	
  short	
  and	
  long	
  running	
  queries	
  
§  Help	
  improve	
  performance	
  of	
  individual	
  queries	
  
§  Be	
  aware:	
  query	
  workload	
  is	
  distributed	
  to	
  every	
  compute	
  node	
  
§  Increasing	
  concurrency	
  may	
  not	
  always	
  help	
  due	
  to	
  resource	
  conten=on	
  
§  CPU,	
  Memory	
  and	
  I/O	
  
§  Total	
  throughput	
  may	
  increase	
  by	
  lekng	
  one	
  query	
  complete	
  first	
  and	
  allowing	
  
other	
  queries	
  to	
  wait	
  
Workload Manager
§  Default	
  :	
  1	
  queue	
  with	
  a	
  concurrency	
  of	
  5	
  
§  Define	
  up	
  to	
  8	
  queues	
  with	
  a	
  total	
  concurrency	
  of	
  15	
  
§  RedshiL	
  has	
  a	
  super	
  user	
  queue	
  internally	
  
Query Performance – Best Practices
§  Encode	
  date	
  and	
  =me	
  using	
  “TIMESTAMP”	
  data	
  type	
  instead	
  of	
  “CHAR”	
  
§  Specify	
  Constraints	
  
§  RedshiL	
  does	
  not	
  enforce	
  constraints	
  (primary	
  key,	
  foreign	
  key,	
  unique	
  values)	
  but	
  
the	
  op=mizer	
  uses	
  it	
  
§  Loading	
  and/or	
  applica=ons	
  need	
  to	
  be	
  aware	
  
§  Specify	
  redundant	
  predicate	
  on	
  the	
  sort	
  column	
  
! !SELECT * FROM tab1, tab2 !
! !WHERE tab1.key = tab2.key !
! !AND tab1.timestamp > '1/1/2013' !
! !AND tab2.timestamp > '1/1/2013';!
§  WLM	
  sekngs	
  
Summary
§  Avoid	
  large	
  number	
  of	
  singleton	
  DML	
  statements	
  if	
  
possible	
  
§  Use	
  COPY	
  for	
  uploading	
  large	
  datasets	
  
§  Choose	
  Sort	
  and	
  Distribu=on	
  keys	
  with	
  care	
  
§  Encode	
  data	
  and	
  =me	
  with	
  TIMESTAMP	
  data	
  type	
  
§  Experiment	
  with	
  WLM	
  sekngs	
  
More Information
Best	
  Prac=ces	
  for	
  Designing	
  Tables	
  
http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html
	
  
Best	
  Prac=ces	
  for	
  Data	
  Loading	
  
http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html
View the Redshift Developer Guide at:
http://aws.amazon.com/documentation/redshift/
Thanks.
aws.amazon.com/big-data

Mais conteúdo relacionado

Mais procurados

Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseAmazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features Amazon Web Services
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
Migration to Redshift from SQL Server
Migration to Redshift from SQL ServerMigration to Redshift from SQL Server
Migration to Redshift from SQL Serverjoeharris76
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesAmazon Web Services
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...Amazon Web Services
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Web Services
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Amazon Web Services
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesAmazon Web Services
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon RedshiftAmazon Web Services
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 

Mais procurados (20)

Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Migration to Redshift from SQL Server
Migration to Redshift from SQL ServerMigration to Redshift from SQL Server
Migration to Redshift from SQL Server
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 

Semelhante a Amazon RedShift - Ianni Vamvadelis

Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database ServicesAmazon Web Services
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...Amazon Web Services
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksAmazon Web Services
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Amazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014Amazon Web Services LATAM
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Amazon Web Services
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?Amazon Web Services Korea
 
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web ServicesAWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web ServicesAmazon Web Services
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Web Services
 
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013Michael Bohlig
 

Semelhante a Amazon RedShift - Ianni Vamvadelis (20)

Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database Services
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 
Deep Dive On Amazon Redshift
Deep Dive On Amazon RedshiftDeep Dive On Amazon Redshift
Deep Dive On Amazon Redshift
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
 
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web ServicesAWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
 
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
 

Mais de huguk

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introhuguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watsonhuguk
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink huguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitchinghuguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoringhuguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapulthuguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysishuguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Socialhuguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligencehuguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 

Mais de huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 

Último

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Amazon RedShift - Ianni Vamvadelis

  • 1. Amazon Redshift Intro, Details Ianni Vamvadelis Solutions Architect
  • 2. Amazon DynamoDB Fast,  Predictable,  Highly-­‐Scalable  NoSQL  Data  Store   Amazon RDS Managed  Rela=onal  Database  Service  for   MySQL,  Oracle  and  SQL  Server   Amazon ElastiCache In-­‐Memory  Caching  Service   Amazon Redshift Fast,  Powerful,  Fully  Managed,  Petabyte-­‐Scale   Data  Warehouse  Service   Compute Storage AWS Global Infrastructure Database Application Services Deployment & Administration Networking AWS Database Services Scalable High Performance Application Storage in the Cloud
  • 3. Amazon DynamoDB Fast,  Predictable,  Highly-­‐Scalable  NoSQL  Data  Store   Amazon RDS Managed  Rela=onal  Database  Service  for   MySQL,  Oracle  and  SQL  Server   Amazon ElastiCache In-­‐Memory  Caching  Service   Amazon Redshift Fast,  Powerful,  Fully  Managed,  Petabyte-­‐Scale   Data  Warehouse  Service   Compute Storage AWS Global Infrastructure Database Application Services Deployment & Administration Networking AWS  Database   Services   Scalable High Performance Application Storage in the Cloud
  • 4. Design  Objec=ves   A  petabyte-­‐scale  data  warehouse  service  that  was…   Amazon   RedshiL   A Whole Lot Simpler A Lot Cheaper A Lot Faster
  • 5. RedshiL  Drama=cally  Reduces  I/O   •  Direct-­‐aNached  storage   •  Large  data  block  sizes   •  Columnar  storage   •  Data  compression   •  Zone  maps   Id Age State 123 20 CA 345 25 WA 678 40 FL Row storage Column storage
  • 6. 16GB RAM 2TB disk 2 cores RedshiL  Runs  on  Op=mized  Hardware   •  Op=mized  for  I/O  intensive  workloads   •  HS1.8XL  available  on  Amazon  EC2   •  Runs  in  HPC  -­‐  fast  network   •  High  disk  density   HS1.8XL: 128GB RAM, 16 Cores, 24 Spindles, 16TB Storage, 2GB/sec scan rate HS1.XL: 16GB RAM, 2 Cores, 3 Spindles, 2TB Storage 16GB RAM 2TB disk 2 cores 16GB RAM 2TB disk 2 cores 16GB RAM 2TB disk 2 cores 16GB RAM 2TB disk 2 cores 16GB RAM 2TB disk 2 cores 16GB RAM 2TB disk 2 cores 16GB RAM 2TB disk 2 cores 16GB RAM 2TB disk 2 cores Click to grow … to 1.6PB
  • 7. RedshiL  Parallelizes  and  Distributes  Everything   Load   Query   Resize   Backup   Restore   10  GigE   (HPC)   Inges=on   Backup   Restore   SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores Amazon S3 JDBC/ODBC   128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node
  • 8. Point  and  Click  Resize  
  • 9. SQL Clients/BI Tools 128GB RAM 48TB disk 16 cores Compute Node 128GB RAM 48TB disk 16 cores Compute Node 128GB RAM 48TB disk 16 cores Compute Node 128GB RAM 48TB disk 16 cores Leader Node Resize  your  cluster  while  remaining  online   128GB RAM 48TB disk 16 cores Compute Node 128GB RAM 48TB disk 16 cores Compute Node 128GB RAM 48TB disk 16 cores Compute Node 128GB RAM 48TB disk 16 cores Compute Node 128GB RAM 48TB disk 16 cores Leader Node New  target  provisioned  in  the  background   Only  charged  for  source  cluster  
  • 10. Resize  your  cluster  while  remaining  online   •  Fully  automated   – Data  automa=cally  redistributed   •  Read  only  mode  during  resize   •  Parallel  node-­‐to-­‐node  data  copy   •  Automa=c  DNS-­‐based  endpoint   cut-­‐over   •  Only  charged  for  one  cluster   SQL Clients/BI Tools 128GB RAM 48TB disk 16 cores Compute Node 128GB RAM 48TB disk 16 cores Compute Node 128GB RAM 48TB disk 16 cores Compute Node 128GB RAM 48TB disk 16 cores Compute Node 128GB RAM 48TB disk 16 cores Leader Node
  • 11. Amazon  RedshiL  has  security  built-­‐in   •  SSL  to  secure  data  in  transit   •  Encryp=on  to  secure  data  at  rest   – AES-­‐256   – All  blocks  on  disks  and  in  Amazon  S3   encrypted   •  No  direct  access  to  compute  nodes   •  Amazon  VPC  support   10  GigE   (HPC)   Inges=on   Backup   Restore   SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Amazon S3 Customer  VPC   Internal   VPC   JDBC/ODBC   Leader Node Compute Node Compute Node Compute Node
  • 12. Con=nuous  Backup,  Automated  Recovery   •  Replica=on  within  the  cluster  and  backup  to  Amazon  S3  to   maintain  mul=ple  copies  of  data  at  all  =mes   •  Backups  to  Amazon  S3  are  con=nuous,  automa=c,  and   incremental   •  Con=nuous  monitoring  and  automated  recovery  from  failures  of   drives  and  nodes   •  Able  to  restore  snapshots  to  any  Availability  Zone  within  a  region  
  • 13. datavolume Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares data available for analysis data generated Gap cost  +   effort  
  • 14. RedshiL  is  Priced  to  Analyze  All  Your  Data   $0.85 per hour for on-demand (2TB) $999 per TB per year (3-yr reservation)
  • 15. Integrates  With  Exis=ng  BI  Tools   Amazon Redshift JDBC/ODBC        
  • 17. Repor=ng  Warehouse   •  Accelerated  opera=onal  repor=ng   •  Support  for  short-­‐=me  use  cases   •  Data  compression,  index  redundancy   RDBMS Redshift OLTP ERP Reporting and BI  
  • 19. Live  Archive  for  (Structured)  Big  Data   •  Direct  integra=on  with  copy  command   •  High  velocity  data     •  Data  ages  into  RedshiL   •  Low  cost,  high  scale  op=on  for  new  apps   DynamoDB Redshift OLTP Web Apps Reporting and BI  
  • 20. Cloud  ETL  for  Big  Data   •  Maintain  online  SQL  access  to  historical  logs   •  Transforma=on  and  enrichment  with  EMR   •  Longer  history  ensures  beNer  insight   Redshift Reporting and BI  Elastic MapReduce S3
  • 21. Ingestion – Best Practices §  Goal:  Leverage  all  the  compute  nodes  and  minimize  overhead   §  Best  Prac=ces   §  Preferred  method  -­‐  COPY  from  S3   §  Loads  data  in  sorted  order  through  the  compute  nodes   §  Single  Copy  command,  Split  data  into  mul=ple  files   §  Strongly  recommend  that  you  gzip  large  datasets   §  If  you  must  ingest  through  SQL   §  Mul=-­‐row  inserts   §  Avoid  large  number  of  singleton    insert/update/delete  opera=ons     §  To  copy  from  another  table   §  CREATE  TABLE  AS  or  INSERT  INTO  SELECT   insert into category_stage values! (default, default, default, default),! (20, default, 'Country', default),! (21, 'Concerts', 'Rock', default);! copy time from 's3://mybucket/data/timerows.gz’ credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access- Key>’ gzip delimiter '|’;!
  • 22. Choose a Sort key §  Goal   §  Skip  over  data  blocks  to  minimize  IO   §  Best  Prac=ce   §  Sort  based  on  range  or  equality  predicate  (WHERE  clause)   §  If  you  access  recent  data  frequently,  sort  based  on  TIMESTAMP  
  • 23. Choose a Distribution Key §  Goal   §  Distribute  data  evenly  across  nodes     §  Minimize  data  movement  among  nodes  :  Co-­‐located  Joins  and  Co-­‐located  Aggregates   §  Best  Prac=ce   §  Consider  using  Join  key  as  distribu=on  key  (JOIN  clause)   §  If  mul=ple  joins,  use  the  foreign  key  of  the  largest  dimension  as  distribu=on  key   §  Consider  using  Group  By  column  as  distribu=on  key  (GROUP  BY  clause)   §  Avoid   §  Keys  used  as  equality  filter  as  your  distribu=on  key   §  If  de-­‐normalized  tables  and  no  aggregates,  do  not  specify  a  distribu=on  key  -­‐RedshiL  will   use  round  robin  
  • 24. Select  sum( S.Price * S.Quantity )! FROM SALES S! JOIN CATEGORY C   ON C.ProductId = S.ProductId! JOIN  FRANCHISE  F ON F.FranchiseId = S.FranchiseId! Where C.CategoryId = ‘Produce’  And  F.State = ‘WA’! AND S.Date Between ‘1/1/2013’  AND ‘1/31/2013’! Example Dist key (C) = ProductID Sort key (S) = Date -- Total Produce sold in Washington in January 2013 Dist key (F) = FranchiseID Dist key (S) = ProductID
  • 25. Workload Manager §  Allows  you  to  manage  and  adjust  query  concurrency   §  WLM    allows  you  to   §  Increase  query  concurrency  up  to  15   §  Define  user  groups  and  query  groups   §  Segregate  short  and  long  running  queries   §  Help  improve  performance  of  individual  queries   §  Be  aware:  query  workload  is  distributed  to  every  compute  node   §  Increasing  concurrency  may  not  always  help  due  to  resource  conten=on   §  CPU,  Memory  and  I/O   §  Total  throughput  may  increase  by  lekng  one  query  complete  first  and  allowing   other  queries  to  wait  
  • 26. Workload Manager §  Default  :  1  queue  with  a  concurrency  of  5   §  Define  up  to  8  queues  with  a  total  concurrency  of  15   §  RedshiL  has  a  super  user  queue  internally  
  • 27. Query Performance – Best Practices §  Encode  date  and  =me  using  “TIMESTAMP”  data  type  instead  of  “CHAR”   §  Specify  Constraints   §  RedshiL  does  not  enforce  constraints  (primary  key,  foreign  key,  unique  values)  but   the  op=mizer  uses  it   §  Loading  and/or  applica=ons  need  to  be  aware   §  Specify  redundant  predicate  on  the  sort  column   ! !SELECT * FROM tab1, tab2 ! ! !WHERE tab1.key = tab2.key ! ! !AND tab1.timestamp > '1/1/2013' ! ! !AND tab2.timestamp > '1/1/2013';! §  WLM  sekngs  
  • 28. Summary §  Avoid  large  number  of  singleton  DML  statements  if   possible   §  Use  COPY  for  uploading  large  datasets   §  Choose  Sort  and  Distribu=on  keys  with  care   §  Encode  data  and  =me  with  TIMESTAMP  data  type   §  Experiment  with  WLM  sekngs  
  • 29. More Information Best  Prac=ces  for  Designing  Tables   http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html   Best  Prac=ces  for  Data  Loading   http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html View the Redshift Developer Guide at: http://aws.amazon.com/documentation/redshift/