O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Deep Dive on Amazon Redshift - AWS Summit Cape Town 2017

804 visualizações

Publicada em

Get a look under the covers: Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve the delivery of your queries and overall database performance. This session explains how to create an optimized schema, use workload management, and tune your queries.

AWS Speaker: Ian Robinson, Specialist Solution Architect, Big Data and Analytics, EMEA - Amazon Web Services

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Deep Dive on Amazon Redshift - AWS Summit Cape Town 2017

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ian Robinson Specialist Solutions Architect, Data & Analytics, EMEA 5 July, 2017 Deep Dive on Amazon Redshift
  2. 2. Managed Massively parallel Petabyte-scale Relational data warehouse Amazon Redshift a lot faster a lot simpler a lot cheaper
  3. 3. Amazon Redshift Cluster Architecture Massively parallel, shared nothing Leader node • SQL endpoint • Stores metadata • Coordinates parallel SQL processing Compute nodes • Local, columnar storage • Executes queries in parallel • Load, backup, restore • 2, 16 or 32 slices 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores S3 / EMR / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node
  4. 4. Your Mission… • Use just enough cluster resources • Minimum amount of work • Equally on each slice
  5. 5. Do an Equal Amount of Work on Each Slice
  6. 6. Choose Best Table Distribution Style All Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 All data on every node Key Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Same key to same location Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Even Round robin distribution
  7. 7. Avoid Data Skew
  8. 8. Avoid Selectively Filtering on Distribution Key WHERE o_orderdate = current_date
  9. 9. Do the Minimum Amount of Work on Each Slice
  10. 10. Columnar storage + Large data block sizes + Data compression + Zone maps + Direct-attached storage analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw 10 | 13 | 14 | 26 |… … | 100 | 245 | 324 375 | 393 | 417… … 512 | 549 | 623 637 | 712 | 809 … … | 834 | 921 | 959 10 324 375 623 637 959 Reduced I/O = Enhanced Performance
  11. 11. Use Cluster Resources Efficiently to Complete Queries as Quickly as Possible
  12. 12. Amazon Redshift Workload Management Waiting Workload Management BI tools SQL clients Analytics tools Client Running Queries: 80% memory ETL: 20% memory 4 Slots 2 Slots 80/4 = 20% per slot 20/2 = 10% per slot
  13. 13. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Justus Roux 5 July 2017 Redshift Deep Dive Learnings from Mukuru
  14. 14. Talking Points Background of Mukuru Process of Creating a Business Intelligence Department Learnings
  15. 15. Mukuru • 1 million+ registered customers • 6,000+ pay-in locations within South Africa • 1,000+ roaming consultants • 130 information centers within South Africa • 28 branches across South Africa • 425,000+ like on Facebook • 1 transfer every 8 seconds Largest International Money Transfer Organisation in the SADC region
  16. 16. Creation of Business Intelligence Department Amazon RDS Real Time Read-Me Replica S3 Bucket Redshift Data Warehouse QuickSight Business Intelligence Reporting Tool Cron Job Git Pull Bash Script Copy csv to S3 Copy csv to Redshift Transform in Redshift Integrity scripts ETL Dashboard Machine Learning
  17. 17. Learnings • Quick to set up Redshift environment • No DBA needed - recovery of table 5 minutes • Copy function – multiple tasks • ETL process - let Redshift do the transforming • Analyze & Vacuum large tables regularly • Awaiting Amazon Glue
  18. 18. Thank You

×