O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Deep Dive On Amazon Redshift

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 27 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Deep Dive On Amazon Redshift (20)

Anúncio

Mais de Amazon Web Services (20)

Mais recentes (20)

Anúncio

Deep Dive On Amazon Redshift

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ian Robinson Specialist Solutions Architect, Data & Analytics, EMEA 3 May, 2017 Deep Dive on Amazon Redshift
  2. 2. Selected Amazon Redshift Customers
  3. 3. Managed Massively parallel Petabyte-scale Relational data warehouse Amazon Redshift a lot faster a lot simpler a lot cheaper
  4. 4. Amazon Redshift Cluster Architecture Massively parallel, shared nothing Leader node • SQL endpoint • Stores metadata • Coordinates parallel SQL processing Compute nodes • Local, columnar storage • Executes queries in parallel • Load, backup, restore • 2, 16 or 32 slices 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores S3 / EMR / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node
  5. 5. Design for Queryability • Use just enough cluster resources • Minimum amount of work • Equally on each slice
  6. 6. Do an Equal Amount of Work on Each Slice
  7. 7. Choose Best Table Distribution Style All Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 All data on every node Key Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Same key to same location Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Even Round robin distribution
  8. 8. Do the Minimum Amount of Work on Each Slice
  9. 9. Columnar storage + Large data block sizes + Data compression + Zone maps + Direct-attached storage analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw 10 | 13 | 14 | 26 |… … | 100 | 245 | 324 375 | 393 | 417… … 512 | 549 | 623 637 | 712 | 809 … … | 834 | 921 | 959 10 324 375 623 637 959 Reduced I/O = Enhanced Performance
  10. 10. Use Cluster Resources Efficiently to Complete as Quickly as Possible
  11. 11. Amazon Redshift Workload Management Waiting Workload Management BI tools SQL clients Analytics tools Client Running Queries: 80% memory ETL: 20% memory 4 Slots 2 Slots 80/4 = 20% per slot 20/2 = 10% per slot
  12. 12. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Anders Bresell, PhD Head of Technology Development & Data Science 2017-05-03 Telenor Connexion An AWS Redshift customer story
  13. 13. www.telenorconnexion.com
  14. 14. VOLVO ON CALL
  15. 15. 04/0
  16. 16. 04/0
  17. 17. Where we started… m2m Platform Built it Sold it Bought it back as service
  18. 18. …then we became big..
  19. 19. …in a green field that turned into a commodity business
  20. 20. …you need to become unique again.
  21. 21. Why Amazon Web Services Our situation • A preference for everything-as-a-service • No own servers • No IT department • Ready to innovate • Streaming data • A need for Real-time Insights AWS offering • Everything as a service • Pay by usage • Infrastructure as code • Self-service for Dev-team • Low cost piloting • Streaming technology • Analytics backend
  22. 22. Why Amazon Redshift Streaming Data Event & Time Series Analytics Real-time Insights Monthly Reports Amazon DynamoDB Amazon Redshift Amazon EMR Amazon Redshift Amazon Kinesis Amazon Redshift Analytics Software support Real-Time Processing Amazon S3 +
  23. 23. Up stream vs. Down stream aggregations Access to hot and cold data Amazon Redshift vs. Spark Group by & Window Function Fully Managed Amazon Redshift
  24. 24. BI Reflections on the journey Analytics Big Data Real-Time Operations ( CX ) Management A typical organization What we did Operations ( CX ) ManagementBIAnalyticsReal-Time Big Data
  25. 25. Several iterations later… More data streams into Redshift More applications on top of Redshift More use cases realized More money saved More revenue streams Less EMR Soon No more EMR
  26. 26. Thank You Anders Bresell, PhD Telenor Connexion Head of Technology Development & Data Science Data is magic!

×