SlideShare uma empresa Scribd logo
1 de 27
Amazon Redshift:
Good practices for
performance optimization
What is Amazon Redshift?
•Petabyte-scale columnar
database
•Fast response time
• ~ 10 times faster than
typical relational database
•Cost-effective (around 1000 $
per TB per year)
When customers are using Amazon Redshift?
• Reduce cost by
extending DW
instead of adding
HW
• Migrate from
existing DWH
system
• Respond faster to
business: provision
in minutes
Replacing
traditional DWH
• Improve
performance by an
order of magnitude
• Make more data
available for
analysis
• Access business
data via standard
reporting tools
Analyzing Big
Data
• Add analytics
functionality to the
applications
• Scale DW capacity
as demand grows
• Reduce SW and HW
costs by an order of
magnitude
Providing services
and SaaS
Amazon Redshift architecture
Redshift reduces IO
• With raw
storage you do
unnecessary IO
• With columnar
storage you
only read the
data you need
Column storage
• COPY
compresses
automatically
• You can analyze
and override
• More
performance,
less cost
Data
compression
• Track the
minimum and
maximum value
for each block
• Skip over blocks
that do not
contain relevant
data
Zone maps Direct-attached
storage
• > 2 GB/sec scan
rate
• Optimized for
data processing
• High disk density
Redshift Recap
Main possible Issues with Redshift
performance
• Incorrect column encoding
• Skewed table data
• Queries not benefiting from
sort keys – Excessive scanning
• Tables without statistics or
which need vacuum
• Tables with very large
VARCHAR columns
• Queries waiting on queue
slots
• Queries that are disk-based –
incorrect sizing, GROUP BY
distinct values, UNION, hash
joins with DISTINCT values
• Commit queue waits
• Inefficient data loads
• Inefficient use of Temporary
Tables
• Large nested loop JOINs
• Inappropriate Join Cardinality
Incorrect column encoding
Running an Amazon Redshift cluster without column encoding is not considered a best practice, and customers find large
performance gains when they ensure that column encoding is optimally applied. To determine if you are deviating from
this best practice, you can use the v_extended_table_info view from the Amazon Redshift Utils GitHub repository.
If you find that you have tables without optimal column encoding, then use
the Amazon Redshift Column Encoding Utility from the Utils repository to
apply encoding. This command line utility uses the ANALYZE COMPRESSION
command on each table. If encoding is required, it generates a SQL script
which creates a new table with the correct encoding, copies all the data
into the new table, and then transactionally renames the new table to the
old name while retaining the original data.
•Raw Encoding
•Byte-Dictionary Encoding
•Delta Encoding
•LZO Encoding
•Mostly Encoding
•Runlength Encoding
•Text255 and Text32k Encodings
•Zstandard Encoding
Skewed table data
If skew is a problem, you typically see
that node performance is uneven on
the cluster. Use table_inspector.sql, to
see how data blocks in a distribution
key map to the slices and nodes in the
cluster.
Consider changing the distribution key
to a column that exhibits high
cardinality and uniform distribution.
Evaluate a candidate column as a
distribution key by creating a new table
using CTAS:
CREATE TABLE my_test_table DISTKEY
(<column name>) AS SELECT <column
name> FROM <table name>;
You can use SCT for setting the proper Distribution Key and Style during the
migration process
Deal with tables with very large VARCHAR
columns
During processing of complex queries, intermediate query results might need to be stored in temporary blocks.
These temporary tables are not compressed, so unnecessarily wide columns consume excessive memory and
temporary disk space, which can affect query performance.
SELECT database, schema || '.' || "table" AS
"table", max_varchar FROM svv_table_info
WHERE max_varchar > 150 ORDER BY 2;
Use the following query to generate a list of
tables that should have their maximum column
widths reviewed:
Identify which table columns have wide varchar columns and
then determine the true maximum width for each wide
column:
SELECT max(len(rtrim(column_name))) FROM table_name;
If you query the top running queries for the database using the top_queries.sql admin script, pay special attention to
SELECT * queries which include the JSON fragment column. If end users query these large columns but don’t use
actually execute JSON functions against them, consider moving them into another table that only contains the
primary key column of the original table and the JSON column.
Queries not benefiting from sort keys
Amazon Redshift tables can have a sort key column identified, which acts like an index in other databases, but which
does not incur a storage cost as with other platforms (for more information, see Choosing Sort Keys). A sort key should
be created on those columns which are most commonly used in WHERE clauses. If you have a known query pattern,
then COMPOUND sort keys give the best performance; if end users query different columns equally, then use an
INTERLEAVED sort key. If using compound sort keys, review your queries to ensure that their WHERE clauses specify the
sort columns in the same order they were defined in the compound key.
To determine which tables don’t have sort keys, run the following query against the v_extended_table_info view from
the Amazon Redshift Utils repository:
SELECT * FROM admin.v_extended_table_info WHERE sortkey IS null;
If you do not have a sort key this potentially can lead to a excessive scanning issue.
Queries waiting on queue slots (ICT)
Amazon Redshift runs queries using a queuing system known as
workload management (WLM). You can define up to 8 queues
to separate workloads from each other, and set the
concurrency on each queue to meet your overall throughput
requirements.
In some cases, the queue to which a user or query has been
assigned is completely busy and a user’s query must wait for a
slot to be open. During this time, the system is not executing
the query at all, which is a sign that you may need to increase
concurrency.
First, you need to determine if any queries are queuing, using
the queuing_queries.sql admin script. Review the maximum
concurrency that your cluster has needed in the past with
wlm_apex.sql, down to an hour-by-hour historical analysis with
wlm_apex_hourly.sql. Keep in mind that while increasing
concurrency allows more queries to run, each query will get a
smaller share of the memory allocated to its queue
Queries that are disk-based (ICT)
SELECT q.query, trim(q.cat_text)
FROM (
SELECT query,
replace( listagg(text,' ') WITHIN GROUP (ORDER BY sequence),
'n', ' ') AS cat_text
FROM stl_querytext
WHERE userid>1
GROUP BY query) q
JOIN (
SELECT distinct query
FROM svl_query_summary
WHERE is_diskbased='t' AND (LABEL LIKE 'hash%' OR LABEL LIKE 'sort%’
OR LABEL LIKE 'aggr%’) AND userid > 1) qs
ON qs.query = q.query;
If a query isn’t able to
completely execute in memory,
it may need to use disk-based
temporary storage for parts of
an explain plan. The additional
disk I/O slows down the query;
this can be addressed by
increasing the amount of
memory allocated to a session
(for more information, see WLM
Dynamic Memory Allocation).
To determine if any queries have been writing to disk, use the following query:
Based on the user or the queue
assignment rules, you can
increase the amount of
memory given to the selected
queue to prevent queries
needing to spill to disk to
complete.
Commit queue waits (ICT)
Amazon Redshift is designed for analytics queries, rather than transaction processing. The cost of COMMIT is relatively
high, and excessive use of COMMIT can result in queries waiting for access to a commit queue.
If you are committing too often on your database, you will start to see waits on the commit queue increase, which can
be viewed with the commit_stats.sql admin script. This script shows the largest queue length and queue time for
queries run in the past two days. If you have queries that are waiting on the commit queue, then look for sessions that
are committing multiple times per session, such as ETL jobs that are logging progress or inefficient data loads.
One of the worst practices is to insert data into Amazon Redshift row by row. Use COPY command or your ETL tool
compatible with Amazon Redshift instead.
Inefficient use of Temporary Tables
Amazon Redshift provides temporary tables, which are like normal tables except that they are only visible within a single
session. When the user disconnects the session, the tables are automatically deleted. Temporary tables can be created
using the CREATE TEMPORARY TABLE syntax, or by issuing a SELECT … INTO #TEMP_TABLE query. The CREATE TABLE
statement gives you complete control over the definition of the temporary table, while the SELECT … INTO and C(T)TAS
commands use the input data to determine column names, sizes and data types, and use default storage properties.
These default storage properties may cause issues if not carefully considered. Amazon Redshift’s default table structure is
to use EVEN distribution with no column encoding. This is a sub-optimal data structure for many types of queries, and if you
are using the SELECT…INTO syntax you cannot set the column encoding or distribution and sort keys. If you use the CREATE
TABLE AS (CTAS) syntax, you can specify a distribution style and sort keys, and Redshift will automatically apply LZO
encoding for everything other than sort keys, booleans, reals and doubles. If you consider this automatic encoding sub-
optimal, and require further control, use the CREATE TABLE syntax rather than CTAS.
If you are creating temporary tables, it is highly recommended that you convert all SELECT…INTO syntax to use the CREATE
statement. This ensures that your temporary tables have column encodings and are distributed in a fashion that is
sympathetic to the other entities that are part of the workflow.
Tables without statistics
• Amazon Redshift, like other databases, requires statistics about tables and the
composition of data blocks being stored in order to make good decisions when
planning a query (for more information, see Analyzing Tables). Without good
statistics, the optimiser may make suboptimal choices about the order in which
to access tables, or how to join datasets together.
• The ANALYZE Command History topic in the Amazon Redshift Developer Guide
supplies queries to help you address missing or stale statistics, and you can also
simply run the missing_table_stats.sql admin script to determine which tables
are missing stats, or the statement below to determine tables that have stale
statistics:
SELECT database, schema || '.' || "table" AS "table", stats_off FROM svv_table_info
WHERE stats_off > 5 ORDER BY 2;
Tables which need VACUUM
In Amazon Redshift, data blocks are immutable. When
rows are DELETED or UPDATED, they are simply logically
deleted (flagged for deletion) but not physically
removed from disk. Updates result in a new block being
written with new data appended. Both of these
operations cause the previous version of the row to
continue consuming disk space and continue being
scanned when a query scans the table. As a result, table
storage space is increased and performance degraded
due to otherwise avoidable disk I/O during scans. A
VACUUM command recovers the space from deleted
rows and restores the sort order.
You can use the perf_alert.sql admin script to identify
tables that have had alerts about scanning a large
number of deleted rows raised in the last seven days.
To address issues with tables with missing or stale statistics or where vacuum is required, run another AWS Labs utility,
Analyze & Vacuum Schema. This ensures that you always keep up-to-date statistics, and only vacuum tables that actually
need reorganization.
Analyze the performance of the queries and
address issues
• Your best friends in a Redshift world
are:
• ANALYZE for identifying out-of-date
statistics
• SVL_QUERY_SUMMARY View for
summary of all the useful data around
the behavior of your queries and high-
level overview of the cluster
• Query Alerts – for receiving an
important information around
deviations in behavior of your queries
• SVL_QUERY_REPORT View – for
collecting details about your queries
health and performance
Setup and finetune monitoring and
maintenance procedures
• In addition to your preferred monitoring methods and multiple partners
solutions we do have some helpful tools:
• Amazon Redshift Column Encoding Utility
• Use table_inspector.sql, to see how data blocks in a distribution key map to the
slices and nodes in the cluster.
• Run the missing_table_stats.sql admin script to determine which tables are missing
stats.
• Use the perf_alert.sql admin script to identify tables that have had alerts about
scanning a large number of deleted rows raised in the last seven days.
• Use top_queries.sql to determine the top running queries.
• Review the maximum concurrency that your cluster has needed in the past with
wlm_apex.sql, down to an hour-by-hour historical analysis with
wlm_apex_hourly.sql.
• View the commit stats with the commit_stats.sql admin script
Eliminate Nested Loops
• Due to SQL query specifying a
join condition that requires a
”brute force” approach
between two large tables
• Quite easy to spot
• Look for “Nested Loop” in a
Query Plans or
STL_ALERT_EVENT_LOG
ON a.date > b.date
ON a.text LIKE b.text
ON a.x = b.x OR a.y = b.y
• Rewrite inequality JOIN condition
as a window function
• Use small nested loop instead of
two large tables
• Maybe you can do a nested loop
join and persist the results in a
separate relational table
Inappropriate Joins Cardinality
• Query “Fan-out”
• Look for high number of rows
generated from joins (higher
than the sum of rows of all
scanned tables) in the query
plan or execution metrics
• Carefully review the joins logic
• Use SVL_QUERY_SUMMARY to
detect
FROM house
JOIN rooms ON rooms.house_id = house.id
JOIN residents ON residents.house_id = house.id
• If possible break large fan-out
queries into several smaller
queries
• Use derived tables to transform
one-to-many joins into one-to-one
joins
• Try out some advanced techniques
like 1 and 2
Suboptimal WHERE clause
If your WHERE clause causes excessive table scans, you might see a SCAN
step in the segment with the highest maxtime value in
SVL_QUERY_SUMMARY. For more information, see Using the
SVL_QUERY_SUMMARY View.
To fix this issue, add a WHERE clause to the query based on the primary sort
column of the largest table. This approach will help minimize scanning time.
For more information, see Amazon Redshift Best Practices for Designing
Tables.
Insufficiently Restrictive Predicate
If your query has an insufficiently restrictive predicate, you might see a
SCAN step in the segment with the highest maxtime value in
SVL_QUERY_SUMMARY that has a very high rows value compared to the
rows value in the final RETURN step in the query. For more information, see
Using the SVL_QUERY_SUMMARY View.
To fix this issue, try adding a predicate to the query or making the existing
predicate more restrictive to narrow the output.
Very Large Result Set
If your query returns a very large result set, consider rewriting the query to
use UNLOAD to write the results to Amazon S3.
This approach will improve the performance of the RETURN step by taking
advantage of parallel processing. For more information on checking for a
very large result set, see Using the SVL_QUERY_SUMMARY View.
Large SELECT List
If your query has an unusually large SELECT list, you might see a bytes value
that is high relative to the rows value for any step (in comparison to other
steps) in SVL_QUERY_SUMMARY. This high bytes value can be an indicator
that you are selecting a lot of columns. For more information, see Using the
SVL_QUERY_SUMMARY View.
To fix this issue, review the columns you are selecting and see if any can be
removed.
Working with SVL_QUERY_SUMMARY
SELECT query, elapsed, substring FROM svl_qlog ORDER BY query DESC limit 5;
Select your query ID
Collect query data
SELECT * FROM svl_query_summary WHERE query = MyQueryID ORDER BY stm, seg, step;
For analysis please refer to: https://docs.aws.amazon.com/redshift/latest/dg/using-SVL-
Query-Summary.html
Some additional materials can be found here
• Best practices for designing queries
• Best practices for designing tables
• Top 10 performance tuning techniques
• Troubleshooting queries
Amazon Redshift Developer Guide
https://docs.aws.amazon.com/redshift/l
atest/dg/redshift-dg.pdf

Mais conteúdo relacionado

Mais procurados

The Real Cost of Slow Time vs Downtime
The Real Cost of Slow Time vs DowntimeThe Real Cost of Slow Time vs Downtime
The Real Cost of Slow Time vs DowntimeRadware
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions Yugabyte
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsYingjun Wu
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
 
How YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQLHow YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQLYugabyte
 
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumAmazon Web Services
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Julien Le Dem
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsScyllaDB
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptxDori Waldman
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataAsis Mohanty
 
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...HostedbyConfluent
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRDouglas Bernardini
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundIntroduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundMasahiko Sawada
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaManish Maheshwari
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLScyllaDB
 

Mais procurados (20)

The Real Cost of Slow Time vs Downtime
The Real Cost of Slow Time vs DowntimeThe Real Cost of Slow Time vs Downtime
The Real Cost of Slow Time vs Downtime
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
 
DDBMS
DDBMSDDBMS
DDBMS
 
How YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQLHow YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQL
 
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your Needs
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptx
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs Exadata
 
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundIntroduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparound
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling Impala
 
RAID Review
RAID ReviewRAID Review
RAID Review
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
 

Semelhante a Optimize Amazon Redshift Performance with Best Practices

(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best PracticesAmazon Web Services
 
Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftAmazon Web Services LATAM
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Julien SIMON
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceAmazon Web Services
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09guest9d79e073
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Mark Ginnebaugh
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008paulguerin
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMahesh Salaria
 
02 database oprimization - improving sql performance - ent-db
02  database oprimization - improving sql performance - ent-db02  database oprimization - improving sql performance - ent-db
02 database oprimization - improving sql performance - ent-dbuncleRhyme
 
Query Optimization in SQL Server
Query Optimization in SQL ServerQuery Optimization in SQL Server
Query Optimization in SQL ServerRajesh Gunasundaram
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best PracticesEduardo Castro
 
Amazon Redshift For Data Analysts
Amazon Redshift For Data AnalystsAmazon Redshift For Data Analysts
Amazon Redshift For Data AnalystsCan Abacıgil
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017Pratim Das
 
SQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersSQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersAdam Hutson
 
Mohan Testing
Mohan TestingMohan Testing
Mohan Testingsmittal81
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
 

Semelhante a Optimize Amazon Redshift Performance with Best Practices (20)

(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
 
Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon Redshift
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performance
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source Database
 
02 database oprimization - improving sql performance - ent-db
02  database oprimization - improving sql performance - ent-db02  database oprimization - improving sql performance - ent-db
02 database oprimization - improving sql performance - ent-db
 
Query Optimization in SQL Server
Query Optimization in SQL ServerQuery Optimization in SQL Server
Query Optimization in SQL Server
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
 
Amazon Redshift For Data Analysts
Amazon Redshift For Data AnalystsAmazon Redshift For Data Analysts
Amazon Redshift For Data Analysts
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 
SQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersSQL Server 2008 Development for Programmers
SQL Server 2008 Development for Programmers
 
Mohan Testing
Mohan TestingMohan Testing
Mohan Testing
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
 

Mais de AWS Germany

Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the FieldAnalytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the FieldAWS Germany
 
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...AWS Germany
 
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...AWS Germany
 
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...AWS Germany
 
Modern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWSModern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWSAWS Germany
 
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with SpinnakerModern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with SpinnakerAWS Germany
 
Building Smart Home skills for Alexa
Building Smart Home skills for AlexaBuilding Smart Home skills for Alexa
Building Smart Home skills for AlexaAWS Germany
 
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureHotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureAWS Germany
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopAWS Germany
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWSAWS Germany
 
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS AWS Germany
 
AWS Programme für Nonprofits
AWS Programme für NonprofitsAWS Programme für Nonprofits
AWS Programme für NonprofitsAWS Germany
 
Microservices and Data Design
Microservices and Data DesignMicroservices and Data Design
Microservices and Data DesignAWS Germany
 
Serverless vs. Developers – the real crash
Serverless vs. Developers – the real crashServerless vs. Developers – the real crash
Serverless vs. Developers – the real crashAWS Germany
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceAWS Germany
 
Secret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s VaultSecret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s VaultAWS Germany
 
Scale to Infinity with ECS
Scale to Infinity with ECSScale to Infinity with ECS
Scale to Infinity with ECSAWS Germany
 
Containers on AWS - State of the Union
Containers on AWS - State of the UnionContainers on AWS - State of the Union
Containers on AWS - State of the UnionAWS Germany
 
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon LightsailDeploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon LightsailAWS Germany
 

Mais de AWS Germany (20)

Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the FieldAnalytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
 
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
 
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
 
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
 
Modern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWSModern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWS
 
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with SpinnakerModern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
 
Building Smart Home skills for Alexa
Building Smart Home skills for AlexaBuilding Smart Home skills for Alexa
Building Smart Home skills for Alexa
 
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureHotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
 
AWS Programme für Nonprofits
AWS Programme für NonprofitsAWS Programme für Nonprofits
AWS Programme für Nonprofits
 
Microservices and Data Design
Microservices and Data DesignMicroservices and Data Design
Microservices and Data Design
 
Serverless vs. Developers – the real crash
Serverless vs. Developers – the real crashServerless vs. Developers – the real crash
Serverless vs. Developers – the real crash
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performance
 
Secret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s VaultSecret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s Vault
 
EKS Workshop
 EKS Workshop EKS Workshop
EKS Workshop
 
Scale to Infinity with ECS
Scale to Infinity with ECSScale to Infinity with ECS
Scale to Infinity with ECS
 
Containers on AWS - State of the Union
Containers on AWS - State of the UnionContainers on AWS - State of the Union
Containers on AWS - State of the Union
 
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon LightsailDeploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
 

Último

Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMoumonDas2
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsaqsarehman5055
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 

Último (20)

Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 

Optimize Amazon Redshift Performance with Best Practices

  • 1. Amazon Redshift: Good practices for performance optimization
  • 2. What is Amazon Redshift? •Petabyte-scale columnar database •Fast response time • ~ 10 times faster than typical relational database •Cost-effective (around 1000 $ per TB per year)
  • 3. When customers are using Amazon Redshift? • Reduce cost by extending DW instead of adding HW • Migrate from existing DWH system • Respond faster to business: provision in minutes Replacing traditional DWH • Improve performance by an order of magnitude • Make more data available for analysis • Access business data via standard reporting tools Analyzing Big Data • Add analytics functionality to the applications • Scale DW capacity as demand grows • Reduce SW and HW costs by an order of magnitude Providing services and SaaS
  • 5. Redshift reduces IO • With raw storage you do unnecessary IO • With columnar storage you only read the data you need Column storage • COPY compresses automatically • You can analyze and override • More performance, less cost Data compression • Track the minimum and maximum value for each block • Skip over blocks that do not contain relevant data Zone maps Direct-attached storage • > 2 GB/sec scan rate • Optimized for data processing • High disk density
  • 7. Main possible Issues with Redshift performance • Incorrect column encoding • Skewed table data • Queries not benefiting from sort keys – Excessive scanning • Tables without statistics or which need vacuum • Tables with very large VARCHAR columns • Queries waiting on queue slots • Queries that are disk-based – incorrect sizing, GROUP BY distinct values, UNION, hash joins with DISTINCT values • Commit queue waits • Inefficient data loads • Inefficient use of Temporary Tables • Large nested loop JOINs • Inappropriate Join Cardinality
  • 8. Incorrect column encoding Running an Amazon Redshift cluster without column encoding is not considered a best practice, and customers find large performance gains when they ensure that column encoding is optimally applied. To determine if you are deviating from this best practice, you can use the v_extended_table_info view from the Amazon Redshift Utils GitHub repository. If you find that you have tables without optimal column encoding, then use the Amazon Redshift Column Encoding Utility from the Utils repository to apply encoding. This command line utility uses the ANALYZE COMPRESSION command on each table. If encoding is required, it generates a SQL script which creates a new table with the correct encoding, copies all the data into the new table, and then transactionally renames the new table to the old name while retaining the original data. •Raw Encoding •Byte-Dictionary Encoding •Delta Encoding •LZO Encoding •Mostly Encoding •Runlength Encoding •Text255 and Text32k Encodings •Zstandard Encoding
  • 9. Skewed table data If skew is a problem, you typically see that node performance is uneven on the cluster. Use table_inspector.sql, to see how data blocks in a distribution key map to the slices and nodes in the cluster. Consider changing the distribution key to a column that exhibits high cardinality and uniform distribution. Evaluate a candidate column as a distribution key by creating a new table using CTAS: CREATE TABLE my_test_table DISTKEY (<column name>) AS SELECT <column name> FROM <table name>; You can use SCT for setting the proper Distribution Key and Style during the migration process
  • 10. Deal with tables with very large VARCHAR columns During processing of complex queries, intermediate query results might need to be stored in temporary blocks. These temporary tables are not compressed, so unnecessarily wide columns consume excessive memory and temporary disk space, which can affect query performance. SELECT database, schema || '.' || "table" AS "table", max_varchar FROM svv_table_info WHERE max_varchar > 150 ORDER BY 2; Use the following query to generate a list of tables that should have their maximum column widths reviewed: Identify which table columns have wide varchar columns and then determine the true maximum width for each wide column: SELECT max(len(rtrim(column_name))) FROM table_name; If you query the top running queries for the database using the top_queries.sql admin script, pay special attention to SELECT * queries which include the JSON fragment column. If end users query these large columns but don’t use actually execute JSON functions against them, consider moving them into another table that only contains the primary key column of the original table and the JSON column.
  • 11. Queries not benefiting from sort keys Amazon Redshift tables can have a sort key column identified, which acts like an index in other databases, but which does not incur a storage cost as with other platforms (for more information, see Choosing Sort Keys). A sort key should be created on those columns which are most commonly used in WHERE clauses. If you have a known query pattern, then COMPOUND sort keys give the best performance; if end users query different columns equally, then use an INTERLEAVED sort key. If using compound sort keys, review your queries to ensure that their WHERE clauses specify the sort columns in the same order they were defined in the compound key. To determine which tables don’t have sort keys, run the following query against the v_extended_table_info view from the Amazon Redshift Utils repository: SELECT * FROM admin.v_extended_table_info WHERE sortkey IS null; If you do not have a sort key this potentially can lead to a excessive scanning issue.
  • 12. Queries waiting on queue slots (ICT) Amazon Redshift runs queries using a queuing system known as workload management (WLM). You can define up to 8 queues to separate workloads from each other, and set the concurrency on each queue to meet your overall throughput requirements. In some cases, the queue to which a user or query has been assigned is completely busy and a user’s query must wait for a slot to be open. During this time, the system is not executing the query at all, which is a sign that you may need to increase concurrency. First, you need to determine if any queries are queuing, using the queuing_queries.sql admin script. Review the maximum concurrency that your cluster has needed in the past with wlm_apex.sql, down to an hour-by-hour historical analysis with wlm_apex_hourly.sql. Keep in mind that while increasing concurrency allows more queries to run, each query will get a smaller share of the memory allocated to its queue
  • 13. Queries that are disk-based (ICT) SELECT q.query, trim(q.cat_text) FROM ( SELECT query, replace( listagg(text,' ') WITHIN GROUP (ORDER BY sequence), 'n', ' ') AS cat_text FROM stl_querytext WHERE userid>1 GROUP BY query) q JOIN ( SELECT distinct query FROM svl_query_summary WHERE is_diskbased='t' AND (LABEL LIKE 'hash%' OR LABEL LIKE 'sort%’ OR LABEL LIKE 'aggr%’) AND userid > 1) qs ON qs.query = q.query; If a query isn’t able to completely execute in memory, it may need to use disk-based temporary storage for parts of an explain plan. The additional disk I/O slows down the query; this can be addressed by increasing the amount of memory allocated to a session (for more information, see WLM Dynamic Memory Allocation). To determine if any queries have been writing to disk, use the following query: Based on the user or the queue assignment rules, you can increase the amount of memory given to the selected queue to prevent queries needing to spill to disk to complete.
  • 14. Commit queue waits (ICT) Amazon Redshift is designed for analytics queries, rather than transaction processing. The cost of COMMIT is relatively high, and excessive use of COMMIT can result in queries waiting for access to a commit queue. If you are committing too often on your database, you will start to see waits on the commit queue increase, which can be viewed with the commit_stats.sql admin script. This script shows the largest queue length and queue time for queries run in the past two days. If you have queries that are waiting on the commit queue, then look for sessions that are committing multiple times per session, such as ETL jobs that are logging progress or inefficient data loads. One of the worst practices is to insert data into Amazon Redshift row by row. Use COPY command or your ETL tool compatible with Amazon Redshift instead.
  • 15. Inefficient use of Temporary Tables Amazon Redshift provides temporary tables, which are like normal tables except that they are only visible within a single session. When the user disconnects the session, the tables are automatically deleted. Temporary tables can be created using the CREATE TEMPORARY TABLE syntax, or by issuing a SELECT … INTO #TEMP_TABLE query. The CREATE TABLE statement gives you complete control over the definition of the temporary table, while the SELECT … INTO and C(T)TAS commands use the input data to determine column names, sizes and data types, and use default storage properties. These default storage properties may cause issues if not carefully considered. Amazon Redshift’s default table structure is to use EVEN distribution with no column encoding. This is a sub-optimal data structure for many types of queries, and if you are using the SELECT…INTO syntax you cannot set the column encoding or distribution and sort keys. If you use the CREATE TABLE AS (CTAS) syntax, you can specify a distribution style and sort keys, and Redshift will automatically apply LZO encoding for everything other than sort keys, booleans, reals and doubles. If you consider this automatic encoding sub- optimal, and require further control, use the CREATE TABLE syntax rather than CTAS. If you are creating temporary tables, it is highly recommended that you convert all SELECT…INTO syntax to use the CREATE statement. This ensures that your temporary tables have column encodings and are distributed in a fashion that is sympathetic to the other entities that are part of the workflow.
  • 16. Tables without statistics • Amazon Redshift, like other databases, requires statistics about tables and the composition of data blocks being stored in order to make good decisions when planning a query (for more information, see Analyzing Tables). Without good statistics, the optimiser may make suboptimal choices about the order in which to access tables, or how to join datasets together. • The ANALYZE Command History topic in the Amazon Redshift Developer Guide supplies queries to help you address missing or stale statistics, and you can also simply run the missing_table_stats.sql admin script to determine which tables are missing stats, or the statement below to determine tables that have stale statistics: SELECT database, schema || '.' || "table" AS "table", stats_off FROM svv_table_info WHERE stats_off > 5 ORDER BY 2;
  • 17. Tables which need VACUUM In Amazon Redshift, data blocks are immutable. When rows are DELETED or UPDATED, they are simply logically deleted (flagged for deletion) but not physically removed from disk. Updates result in a new block being written with new data appended. Both of these operations cause the previous version of the row to continue consuming disk space and continue being scanned when a query scans the table. As a result, table storage space is increased and performance degraded due to otherwise avoidable disk I/O during scans. A VACUUM command recovers the space from deleted rows and restores the sort order. You can use the perf_alert.sql admin script to identify tables that have had alerts about scanning a large number of deleted rows raised in the last seven days. To address issues with tables with missing or stale statistics or where vacuum is required, run another AWS Labs utility, Analyze & Vacuum Schema. This ensures that you always keep up-to-date statistics, and only vacuum tables that actually need reorganization.
  • 18. Analyze the performance of the queries and address issues • Your best friends in a Redshift world are: • ANALYZE for identifying out-of-date statistics • SVL_QUERY_SUMMARY View for summary of all the useful data around the behavior of your queries and high- level overview of the cluster • Query Alerts – for receiving an important information around deviations in behavior of your queries • SVL_QUERY_REPORT View – for collecting details about your queries health and performance
  • 19. Setup and finetune monitoring and maintenance procedures • In addition to your preferred monitoring methods and multiple partners solutions we do have some helpful tools: • Amazon Redshift Column Encoding Utility • Use table_inspector.sql, to see how data blocks in a distribution key map to the slices and nodes in the cluster. • Run the missing_table_stats.sql admin script to determine which tables are missing stats. • Use the perf_alert.sql admin script to identify tables that have had alerts about scanning a large number of deleted rows raised in the last seven days. • Use top_queries.sql to determine the top running queries. • Review the maximum concurrency that your cluster has needed in the past with wlm_apex.sql, down to an hour-by-hour historical analysis with wlm_apex_hourly.sql. • View the commit stats with the commit_stats.sql admin script
  • 20. Eliminate Nested Loops • Due to SQL query specifying a join condition that requires a ”brute force” approach between two large tables • Quite easy to spot • Look for “Nested Loop” in a Query Plans or STL_ALERT_EVENT_LOG ON a.date > b.date ON a.text LIKE b.text ON a.x = b.x OR a.y = b.y • Rewrite inequality JOIN condition as a window function • Use small nested loop instead of two large tables • Maybe you can do a nested loop join and persist the results in a separate relational table
  • 21. Inappropriate Joins Cardinality • Query “Fan-out” • Look for high number of rows generated from joins (higher than the sum of rows of all scanned tables) in the query plan or execution metrics • Carefully review the joins logic • Use SVL_QUERY_SUMMARY to detect FROM house JOIN rooms ON rooms.house_id = house.id JOIN residents ON residents.house_id = house.id • If possible break large fan-out queries into several smaller queries • Use derived tables to transform one-to-many joins into one-to-one joins • Try out some advanced techniques like 1 and 2
  • 22. Suboptimal WHERE clause If your WHERE clause causes excessive table scans, you might see a SCAN step in the segment with the highest maxtime value in SVL_QUERY_SUMMARY. For more information, see Using the SVL_QUERY_SUMMARY View. To fix this issue, add a WHERE clause to the query based on the primary sort column of the largest table. This approach will help minimize scanning time. For more information, see Amazon Redshift Best Practices for Designing Tables.
  • 23. Insufficiently Restrictive Predicate If your query has an insufficiently restrictive predicate, you might see a SCAN step in the segment with the highest maxtime value in SVL_QUERY_SUMMARY that has a very high rows value compared to the rows value in the final RETURN step in the query. For more information, see Using the SVL_QUERY_SUMMARY View. To fix this issue, try adding a predicate to the query or making the existing predicate more restrictive to narrow the output.
  • 24. Very Large Result Set If your query returns a very large result set, consider rewriting the query to use UNLOAD to write the results to Amazon S3. This approach will improve the performance of the RETURN step by taking advantage of parallel processing. For more information on checking for a very large result set, see Using the SVL_QUERY_SUMMARY View.
  • 25. Large SELECT List If your query has an unusually large SELECT list, you might see a bytes value that is high relative to the rows value for any step (in comparison to other steps) in SVL_QUERY_SUMMARY. This high bytes value can be an indicator that you are selecting a lot of columns. For more information, see Using the SVL_QUERY_SUMMARY View. To fix this issue, review the columns you are selecting and see if any can be removed.
  • 26. Working with SVL_QUERY_SUMMARY SELECT query, elapsed, substring FROM svl_qlog ORDER BY query DESC limit 5; Select your query ID Collect query data SELECT * FROM svl_query_summary WHERE query = MyQueryID ORDER BY stm, seg, step; For analysis please refer to: https://docs.aws.amazon.com/redshift/latest/dg/using-SVL- Query-Summary.html
  • 27. Some additional materials can be found here • Best practices for designing queries • Best practices for designing tables • Top 10 performance tuning techniques • Troubleshooting queries Amazon Redshift Developer Guide https://docs.aws.amazon.com/redshift/l atest/dg/redshift-dg.pdf