Mais conteúdo relacionado Semelhante a Modernize your data warehouse with Amazon Redshift - ADB305 - New York AWS Summit (20) Mais de Amazon Web Services (20) Modernize your data warehouse with Amazon Redshift - ADB305 - New York AWS Summit1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Modernize your data warehouse with
Amazon Redshift
Harshida Patel
DW Specialist SA
AWS
A D B 3 0 5
2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Agenda
Startup:
1. Log in to the AWS Management Console (using your account and credits, feel free to work
with a neighbor)
2. Switch to the Oregon region (us-west-2)
3. Create an IAM role for Amazon Redshift Spectrum
4. Create an Amazon Redshift cluster and associate the IAM role
5. Update the security group to allow Amazon Redshift
Refresher on Amazon Redshift
Workshop time
3. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
4. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://tinyurl.com/y33amykm
1. Log in to the console (using your account and credits, or the workshop account)
2. Switch to the Oregon Region (us-west-2)
3. Create an IAM role for Amazon Redshift Spectrum
4. Create an Amazon Redshift cluster and associate the IAM role
5. Update the security group to allow Amazon Redshift
5. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
6. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Load
Unload
Backup
Restore
Massively parallel, shared nothing columnar
architecture
Leader node
• SQL endpoint
• Stores metadata
• Coordinates parallel SQL processing
Compute nodes
• Local, columnar storage
• Executes queries in parallel
• Load, unload, backup, restore
Amazon Redshift Spectrum nodes
• Execute queries directly against
Amazon Simple Storage Service (Amazon
S3)
SQL clients/BI tools
128GB RAM
16TB disk
16 cores
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
node
128GB RAM
16TB disk
16 coresCompute
node
128GB RAM
16TB disk
16 coresCompute
node
Leader
node
Amazon S3
...
1 2 3 4 N
Amazon
Redshift
Spectrum
Load
Query
Amazon Redshift architecture
7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon Redshift Advisor: Your DBA’s best friend
• Amazon Redshift expert system available in the console
• Identifies undesirable user behaviors for resolution by providing
high-impact recommendations to improve performance and
reduce cost
• >96% of clusters have tailored feedback
• Actionable WLM, COPY, storage, and system maintenance feedback
• Analyses have doubled since launch (July ‘18); will double again by
EOY
8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Five points of guidance for Amazon Redshift (SET DW)
Sort Key (to improve filter performance) Choose up to three columns (Compound Sort Key)
ordered in increasing order of specificity, balanced with likelihood of use.
Encoding of Columns Compress all columns except for the first sort key column.
Table Maintenance VACUUM and ANALYZE tables weekly (use the Amazon Redshift Advisor and/or
STL_ALERT_EVENT_LOG as a guide for frequency).
Distribution Key (to improve join performance) strategy that:
• Follows the common join pattern for the table and evenly distributes the data across the database slic
on the cluster.
• DISTSTYLE AUTO is a great go-to for all tables < ~5 million rows.
• DISTSTYLE EVEN is a good fail-safe, but remember data redistribution.
Workload Management (WLM) and Query Monitoring Rules (QMR)
• Start with defining up to ~3 queues.
• Split up the memory across the queues. Monitor the percent of each queue’s workload going to disk.
• Anticipate changing WLM settings to match the workload changes (day|night, weekday|weekend).
• Use QMR.
9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Availability of intelligent administration and maintenance features
Distribution key Recommendation for distribution key
Sort key Recommendation for sort key
Concurrency
setting
Automation for concurrent setting, making
it dynamic
Vacuum Auto vacuum in the background
Analyze Auto analyze in the background
10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Concurrency Scaling
Amazon Redshift automatically adds transient clusters,
in seconds, to serve sudden spike in concurrent requests with
consistently fast performance.
Backup Caching layer
How it works:
All queries go to the leader node.
The user experiences less wait for
queries.
When queries in designated WLM
queue begin queuing, Amazon
Redshift automatically routes them
to the new clusters, automatically
enabling Concurrency Scaling.
Amazon Redshift automatically spins
up a new cluster, processes waiting
queries, and automatically shuts
down the concurrency scaling
cluster.
1
2
3
For every 24 hours that your main
cluster is in use, you accrue a one-
hour credit for concurrency
scaling. this means that
concurrency scaling is free for
>97% of customers.
Launched
11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
15. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://tinyurl.com/y33amykm
16. Thank you!
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Harshida Patel
DW Specialist SA
AWS