SlideShare uma empresa Scribd logo
1 de 63
Baixar para ler offline
SQL Joins
and Query Optimization

How it works


by Brian Gallagher
www.BrianGallagher.com
         G
Why
Wh JOIN?
 Combine data from multiple tables
 Reduces the number of queries needed
 Moves processing burden to database
  server
 Improves data integrity (over combining
  data in
  d t i program code)
                    d )
Types of JOINs
T      f JOIN
   INNER
        the most common, return all rows with matching records
                        ,                             g
        SELECT * FROM T1 INNER JOIN T2 ON T1.fld1 = T2.fld2
   LEFT or LEFT OUTER
        return all rows on the left (first) table and right (second)
        If no matching record on the right side, NULL-values for each field are returned
        SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1 fld1 = T2 fld2  T1.fld1   T2.fld2
   FULL or FULL OUTER
        return all rows on the left (first) table and right (second)
        If no matching record on the left or right side, NULL-values for each field are returned
        S
         SELECT * FROM T1 FULL OU
               C         O         U       OUTER JO JOIN T2 ON T1.fld1 = T2.fld2
                                                                O     . d        . d
   CROSS
        the least common, return all possible row combinations
        SELECT * FROM T1, T2
   RIGHT or RIGHT OUTER
      Don’t use them without a very good reason.
      They do not add any functionality over a LEFT JOIN and make code more confusing
      Works the same as the LEFT and LEFT OUTER JOINs, but the second table is the one
       which all rows are returned from. These two queries are functionally identical:
      SELECT * FROM T1 LEFT JOIN T2 ON T1.fld1 = T2.fld2
      SELECT * FROM T2 RIGHT JOIN T1 ON T1.fld1 = T2.fld2
Sample T t Data
S   l Test D t
   Sample Customer and Job records
INNER JOIN
LEFT (OUTER) JOIN
     (     )
One-To-Many LEFT JOIN
          y
RIGHT (OUTER) JOIN
      (     )
FULL (OUTER) JOIN
     (     )
FULL (OUTER) JOIN
 FULL JOIN is not supported in MS-Access
 Can be emulated using UNION queries


   They
    Th are rarely used
               l     d
CROSS JOINs

The Join you never use…
except every time
Making Big
M ki it Bi
   A CROSS JOIN combines every row on the left
    table with every row in the right table
   Resulting recordset will b th t t l of b th row
    R     lti       d t ill be the total f both
    counts multiplied together
     Leftrow has 10 000 records
                   10,000
     Right row has 30,000 records
     Resulting recordset has 300,000,000 records
   If no JOIN condition is specified, a CROSS JOIN
    will be executed
Joins
J i each record t th other t bl
       h      d to the th table
CROSS JOIN’s resulting set of all
        JOIN s
record combinations
How C
H   Cross J i M k I
          Joins Make Inner J i
                           Joins
   Tables are joined using a CROSS JOIN
   JOIN criteria is evaluated, removing all records
    which d ’t match th “ON” condition
      hi h don’t    t h the         diti
    SELECT *
    FROM Customers C
    INNER JOIN Job J
    ON C.CustID = J.CustID
   The remaining rows are the recordset returned
    for the INNER JOIN
Using “ON” criteria to select records
One-to-Many CROSS JOIN
          y
One to Many: Cross Join to Inner Join
Unpredictable R
U    di t bl Record O d
                  d Orders
   Databases are not required to return records in
    any particular order
   Records may be returned by the order they are
    stored on disk or may be unordered, depending
    on how the query engine handles data
   If you need data in a particular order, use an
              dd t i         ti l      d
    ORDER BY clause
   Some databases may return data presorted by
    Primary key or Clustered index
     DO NOT DEPEND on this behavior
     It is not reliably portable across different databases
     Do not make a program rely on a behavior not
      specified by the program – bad programming style
Indexes

Unique, clustered, etc.
What
Wh t are Indexes
         I d
 Indexes are a pre-sorted list of database
  field values
 This list speeds up queries A LOT
     An index is much smaller than the full
      recordset
     An index tracks only the fields specified when
      created
Indexes and P f
I d       d Performance
   Indexes GREATLY speed up queries on the
    fields being indexed
   Indexes SLIGHTLY slow d
    I d                  l down INSERTS
                                INSERTS,
    UPDATES and DELETES
     The indexes need to be updated anytime a record
      changes
     This adds a small bit of overhead to the system
   The increase in query speed usually greatly
    offsets the slight extra load on INSERTS,
    UPDATES and DELETES
     Most  database use is typically reading (instead of
      writing)
Types of Indexes
T      fI d
   Simple Index
     Indexesthe field specified
     Can have multiple simple indexes
     Speeds up queries using the indexed field
   Composite Index
     Indexescombinations of fields
     Can have multiple composite indexes
     Speeds up queries using the specified
      combination of fi ld
         bi ti     f fields
Simple Indexes
Si l I d
   An Index contains a list of all field values
    and pointers to the record with that value
        p
How Indexes help SELECTs
               p
Composite I d
C     it Indexes
   Useful only when you will be querying multiple
    fields simultaneously
   Most selective (resulting in the fewest records
    matching) fields should be listed first
   Slight performance gain over multiple Simple
    Indexes when used correctly
   No performance gain (possibly even a
    performance loss) when used incorrectly.
Composite Index
   p
Searching I d
S    hi Indexes
 There are high performance algorithms for
              high-performance
  searching pre-sorted data
I d
  Index data stored i t
         d t t d internally as bi
                           ll  binary t
                                      trees
  (typically)
 Searching through
  binary tree much
  faster than reading
  through table
Design Strategy

Make it work
Make it fast
Make it pretty
        p y
Focus on what’s needed
F         h t’     d d
   Make your query as specific as possible
     Makes  joins simpler (and therefore faster)
     Returns no unwanted data
     Increases response time
     Reduces network and server load

 Put as much as possible in the WHERE
  clause
 J i your fi ld properly
  Join     fields      l
Maximum S l ti it / S
M i     Selectivity Specificity
                        ifi it

 If differents parts of the WHERE clause
  will result in smaller data sets than other
  ones, list them the ones that will most
  g
  greatly reduce the data first
          y
 List the others in the same order
Optimizing Queries

Faster is Better
Horizontal P titi i
H i    t l Partitioning
 Avoid extremely “wide” records (records
  with lots of fields)
                     )
 Split data into two tables with a common
  ID field
 All the record data is rarely used in all
  queries,
  queries so it reduces traffic and speeds
  up processing and disk reads
Add I d
    Indexes
   Make sure all fields searched on regularly
    are indexed
Define Clustered Index
D fi Cl t d I d
 The most-likely criteria to be sorted on
  should be defined as your clustered index
                         y
 Clustered index defines the order the
  records will physically be stored on disk
Normalize D t
N    li Data
 No redundant data
 Link related data
 Don’t go crazy
Denormalize D t
D      li Data
   If joining more 4 tables or more in a single
    q y,
    query, consider de-normalizing data
                                   g
    (combining data from external tables back
    into the main table))
     Can  return faster query results
     Risks include having to manage duplicate
      data in database and larger disk usage
Size records to fit database page
size
   SQL Server 7.0, 2000, and 2005 data pages are
    8K (8192 bytes) in size.
     Of  the
          th 8192 available b t i each d t page, only
                       il bl bytes in    h data         l
      8060 bytes are used to store a row. The rest is used
      for overhead.
     If you have a row that was 4031 bytes long, then only
      one row could fit into a data page, and the remaining
      4029 bytes left in the page would be empty
              y              p g               py
          Making each record 1 byte shorter would halve the disk
           access required to read the table
Use Primary K
U aPi       Key
   Make sure you have a primary key defined
   Ideally, it should be a single unique field
     You can define composite keys, but they are
      generally not needed if the database is designed
      p p y
      properly
   Most tables should have a unique TableNameID
    field
     This allows you to identify any row by a single unique
      value
Use
U an IDENTITY C l
              Column
   If there is:
     no Primary Key on the table
     no unique index on the table
   Then
     Add   an IDENTITY (unique value) column to
      the table
     Optionally, index the IDENTITY column if you
      will query it regularly
Move TEXT, NTEXT, and IMAGE
data into table
 These types normally stored outside the
  table (uses a pointer to the data in the
         (      p
  field)
 If these types will be searched frequently
                                  frequently,
  consider moving them into the database
  table s
  table’s storage with:
    sp_tableoption 'tablename', 'text in row', 'on'
      or
    sp_tableoption 'tablename', 'text in row', 'size'
Consider R li ti i advance
C   id Replication in d
   If Replication is to be used, factor the
    decision into the original design of the
                          g         g
    database
Use Built-in Referential I t it
U B ilt i R f       ti l Integrity
 Use foreign keys and validation
  constraints built into the database
 Don’t manage it in the application


   Benefits:
     Faster execution
     Can’t mess it up with application errors
Re-test h
R t t when changing servers
            h   i
   Retest and rebenchmark applications when
    moving to a new server, such as:
     Development
     Staging
     Production
   Problems often caused b
    P bl      ft        d by:
     More rows in test data on servers “closer” to
      production
     Server configurations different
     Tables not indexed the same way
Constraints are F t
C   t i t       Fast
   Constraints on fields are faster than
     Triggers
     Rules
     Defaults
Don’t duplicate ff t
D ’t d li t effort
   Don’t check for the same thing twice
     (duh, but it happens)
     Don’t use a trigger and a constraint to do the
      same thing g
     Same for constraints and defaults
     Same for constraints and rules
Limit
Li it records t b JOIN d
           d to be JOINed
   Use WHERE clause to minimize rows to
    be JOINed.
     Particularly   in the OUTER table of an OUTER
     JOIN
Index F i K
I d Foreign Keys
 Fields in a table that are a foreign key are
  not indexed automatically j
                            y just because
  they are a foreign key.
 Add these indexes manually
Minimize Duplicate
Mi i i D li t JOIN fi ld d t
                   field data

   JOINs are slower when there are few
    different keys in the j
                y         joining table
                                g
JOIN on unique indexed fi ld
          i    i d   d fields
   JOINs will perform fastest when joining
    indexed fields with no duplicate data
                             p
JOIN Numeric Fi ld
     N    i Fields
   JOINs on numeric fields perform much
    faster than JOINs on other datatype fields.
                                    yp
JOIN th exact same d t t
     the    t      datatypes

   JOINs should be to the exact same
    datatype for best p
         yp           performance
     Same type of field
     Same field length
     Same encoding (ASCII vs. Unicode, etc)
Use
U ANSI JOIN syntax
               t
   Improves readability
   Less likely to cause programmer errors
   No Aliases:
    SELECT fname, lname, department
      FROM names INNER JOIN departments ON
      names.employeeid
      names employeeid = departments employeeid
                         departments.employeeid
   Microsoft syntax example:
    SELECT fname, lname, department
      FROM names departments
           names,
      WHERE names.employeeid = departments.employeeid
   Code is more portable between databases
Use Table Aliases
U T bl Ali
   Shortens code
   Makes it easier to follow, especially with long queries
   Identifies which table each field is coming from
   No table aliases:
    SELECT fname, lname, department
      FROM names INNER JOIN departments ON
      names.employeeid = departments.employeeid
   Microsoft syntax example:
    SELECT N.fname, N.lname, D.department
      FROM names N INNER JOIN departments D ON
      N.employeeid = D.employeeid
Don’t
D ’t use SELECT *
   Requires additional parsing on the server to
    extract field names
   Returns unneccesary data (unless you are
    actually using every field)
   Returns duplicate data on JOINs (do you really
    need two copies of the RecordID field?)
   Can cause errors (in some databases) if JOINed
    tables have fields with the same name
Store i separate fil in fil
St    in      t files i filegroup

 For very large joins placing tables to be
  j
  joined in separate p y
              p      physical files within the
  same filegroup can improve performance
 SQL Server can spawn separate threads
  for processing each file
Don’t
D ’t use CROSS JOINs
               JOIN
 Unless actually needed (rarely) do not use
  a CROSS JOIN (returns all combinations
                     (
  of records on both sides of JOIN)
 People sometimes will do a CROSS JOIN
  and then use DISTINCT and GROUP BY
  to eliminate all the duplication
     Don’t   do that.
JOINs
JOIN vs. Subquery
         S b
   Depending on the specifics, either could be faster. Write
    both and test performance to be sure which is best.
   JOIN:
    SELECT a.*
      FROM Table1 a INNER JOIN Table2 b
      ON a.Table1ID = b.Table1ID
      WHERE b.Table2ID = 'SomeValue'
   Subquery:
    SELECT a.* FROM Table1 a WHERE Table1ID IN
           a.
      (SELECT Table1ID FROM Table2 WHERE Table2ID =
      'SomeValue')
Avoid Subqueries unless needed
A id S b     i     l       d d

 Avoid subqueries unless actually required
 Most subqueries can be expressed as
  JOINs
 Subqueries are generally harder to read
  and understand (for humans) and,
  therefore,
  therefore maintain
Use an Indexed View
(Enterprise 2000 and later only)
 An Indexed View maintains an updated
  record of how the tables are joined via a
                               j
  clustered index
 This slows INSERTs, UPDATEs and
              INSERTs
  DELETEs a bit, so consider the tradeoff
  for the faster queries
Use Database’s Performance
    Database s
Optimization Tools
 Most major databases have methods for
  monitoring and optimizing database
             g     p      g
  performance
 Google “databaseName optimizing” for
            databaseName optimizing
  tons of links
Avoid
A id DISTINCT when possible
               h       ibl
   DISTINCT clauses are frequently (and usually
    unintentionally) used to hide an incorrect JOIN
   Properly normalized database will not frequently
    need DISTINCT clauses
   Look for incorrect JOINs or create more explicit
    WHERE clauses to avoid needing DISTINCT
   Of course, use it when appropriate
The End
               (for now)

Google: Query Optimization
for lots more information

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Introduction to-sql
Introduction to-sqlIntroduction to-sql
Introduction to-sql
 
Oracle Course
Oracle CourseOracle Course
Oracle Course
 
Advanced Sql Training
Advanced Sql TrainingAdvanced Sql Training
Advanced Sql Training
 
SQL Queries
SQL QueriesSQL Queries
SQL Queries
 
SQL Join Basic
SQL Join BasicSQL Join Basic
SQL Join Basic
 
Basic sql Commands
Basic sql CommandsBasic sql Commands
Basic sql Commands
 
SQL Views
SQL ViewsSQL Views
SQL Views
 
Sql commands
Sql commandsSql commands
Sql commands
 
Sql joins
Sql joinsSql joins
Sql joins
 
SQL Server Learning Drive
SQL Server Learning Drive SQL Server Learning Drive
SQL Server Learning Drive
 
SQL JOIN
SQL JOINSQL JOIN
SQL JOIN
 
Introduction of sql server indexing
Introduction of sql server indexingIntroduction of sql server indexing
Introduction of sql server indexing
 
Presentation slides of Sequence Query Language (SQL)
Presentation slides of Sequence Query Language (SQL)Presentation slides of Sequence Query Language (SQL)
Presentation slides of Sequence Query Language (SQL)
 
Sql notes, sql server,sql queries,introduction of SQL, Beginner in SQL
Sql notes, sql server,sql queries,introduction of SQL, Beginner in SQLSql notes, sql server,sql queries,introduction of SQL, Beginner in SQL
Sql notes, sql server,sql queries,introduction of SQL, Beginner in SQL
 
SQL Queries Information
SQL Queries InformationSQL Queries Information
SQL Queries Information
 
SQL Views
SQL ViewsSQL Views
SQL Views
 
Sql join
Sql  joinSql  join
Sql join
 
Query optimization in SQL
Query optimization in SQLQuery optimization in SQL
Query optimization in SQL
 
Advanced sql
Advanced sqlAdvanced sql
Advanced sql
 
Sql clauses by Manan Pasricha
Sql clauses by Manan PasrichaSql clauses by Manan Pasricha
Sql clauses by Manan Pasricha
 

Semelhante a SQL Joins and Query Optimization

Database Performance
Database PerformanceDatabase Performance
Database PerformanceBoris Hristov
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL IndexingBADR
 
CS8391-DATA-STRUCTURES.pdf
CS8391-DATA-STRUCTURES.pdfCS8391-DATA-STRUCTURES.pdf
CS8391-DATA-STRUCTURES.pdfraji175286
 
Physical elements of data
Physical elements of dataPhysical elements of data
Physical elements of dataDimara Hakim
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Lviv Startup Club
 
0104 abap dictionary
0104 abap dictionary0104 abap dictionary
0104 abap dictionaryvkyecc1
 
初探AWS 平台上的 NoSQL 雲端資料庫服務
初探AWS 平台上的 NoSQL 雲端資料庫服務初探AWS 平台上的 NoSQL 雲端資料庫服務
初探AWS 平台上的 NoSQL 雲端資料庫服務Amazon Web Services
 
CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index StructuresJ Singh
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performanceguest9912e5
 
Structured Query Language (SQL) _ Edu4Sure Training.pptx
Structured Query Language (SQL) _ Edu4Sure Training.pptxStructured Query Language (SQL) _ Edu4Sure Training.pptx
Structured Query Language (SQL) _ Edu4Sure Training.pptxEdu4Sure
 
Indy pass writing efficient queries – part 1 - indexing
Indy pass   writing efficient queries – part 1 - indexingIndy pass   writing efficient queries – part 1 - indexing
Indy pass writing efficient queries – part 1 - indexingeddiew
 

Semelhante a SQL Joins and Query Optimization (20)

Database Performance
Database PerformanceDatabase Performance
Database Performance
 
Unit08 dbms
Unit08 dbmsUnit08 dbms
Unit08 dbms
 
Sap abap
Sap abapSap abap
Sap abap
 
T-SQL Overview
T-SQL OverviewT-SQL Overview
T-SQL Overview
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL Indexing
 
Ms sql server tips 1 0
Ms sql server tips 1 0Ms sql server tips 1 0
Ms sql server tips 1 0
 
Indexing
IndexingIndexing
Indexing
 
San diegophp
San diegophpSan diegophp
San diegophp
 
CS8391-DATA-STRUCTURES.pdf
CS8391-DATA-STRUCTURES.pdfCS8391-DATA-STRUCTURES.pdf
CS8391-DATA-STRUCTURES.pdf
 
Physical elements of data
Physical elements of dataPhysical elements of data
Physical elements of data
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
Data structure
 Data structure Data structure
Data structure
 
SAP ABAP data dictionary
SAP ABAP data dictionarySAP ABAP data dictionary
SAP ABAP data dictionary
 
0104 abap dictionary
0104 abap dictionary0104 abap dictionary
0104 abap dictionary
 
初探AWS 平台上的 NoSQL 雲端資料庫服務
初探AWS 平台上的 NoSQL 雲端資料庫服務初探AWS 平台上的 NoSQL 雲端資料庫服務
初探AWS 平台上的 NoSQL 雲端資料庫服務
 
Etl2
Etl2Etl2
Etl2
 
CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index Structures
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
 
Structured Query Language (SQL) _ Edu4Sure Training.pptx
Structured Query Language (SQL) _ Edu4Sure Training.pptxStructured Query Language (SQL) _ Edu4Sure Training.pptx
Structured Query Language (SQL) _ Edu4Sure Training.pptx
 
Indy pass writing efficient queries – part 1 - indexing
Indy pass   writing efficient queries – part 1 - indexingIndy pass   writing efficient queries – part 1 - indexing
Indy pass writing efficient queries – part 1 - indexing
 

Último

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Último (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

SQL Joins and Query Optimization

  • 1. SQL Joins and Query Optimization How it works by Brian Gallagher www.BrianGallagher.com G
  • 2. Why Wh JOIN?  Combine data from multiple tables  Reduces the number of queries needed  Moves processing burden to database server  Improves data integrity (over combining data in d t i program code) d )
  • 3. Types of JOINs T f JOIN  INNER  the most common, return all rows with matching records , g  SELECT * FROM T1 INNER JOIN T2 ON T1.fld1 = T2.fld2  LEFT or LEFT OUTER  return all rows on the left (first) table and right (second)  If no matching record on the right side, NULL-values for each field are returned  SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1 fld1 = T2 fld2 T1.fld1 T2.fld2  FULL or FULL OUTER  return all rows on the left (first) table and right (second)  If no matching record on the left or right side, NULL-values for each field are returned  S SELECT * FROM T1 FULL OU C O U OUTER JO JOIN T2 ON T1.fld1 = T2.fld2 O . d . d  CROSS  the least common, return all possible row combinations  SELECT * FROM T1, T2  RIGHT or RIGHT OUTER  Don’t use them without a very good reason.  They do not add any functionality over a LEFT JOIN and make code more confusing  Works the same as the LEFT and LEFT OUTER JOINs, but the second table is the one which all rows are returned from. These two queries are functionally identical:  SELECT * FROM T1 LEFT JOIN T2 ON T1.fld1 = T2.fld2  SELECT * FROM T2 RIGHT JOIN T1 ON T1.fld1 = T2.fld2
  • 4. Sample T t Data S l Test D t  Sample Customer and Job records
  • 10. FULL (OUTER) JOIN  FULL JOIN is not supported in MS-Access  Can be emulated using UNION queries  They Th are rarely used l d
  • 11. CROSS JOINs The Join you never use… except every time
  • 12. Making Big M ki it Bi  A CROSS JOIN combines every row on the left table with every row in the right table  Resulting recordset will b th t t l of b th row R lti d t ill be the total f both counts multiplied together  Leftrow has 10 000 records 10,000  Right row has 30,000 records  Resulting recordset has 300,000,000 records  If no JOIN condition is specified, a CROSS JOIN will be executed
  • 13. Joins J i each record t th other t bl h d to the th table
  • 14. CROSS JOIN’s resulting set of all JOIN s record combinations
  • 15. How C H Cross J i M k I Joins Make Inner J i Joins  Tables are joined using a CROSS JOIN  JOIN criteria is evaluated, removing all records which d ’t match th “ON” condition hi h don’t t h the diti SELECT * FROM Customers C INNER JOIN Job J ON C.CustID = J.CustID  The remaining rows are the recordset returned for the INNER JOIN
  • 16. Using “ON” criteria to select records
  • 18. One to Many: Cross Join to Inner Join
  • 19. Unpredictable R U di t bl Record O d d Orders  Databases are not required to return records in any particular order  Records may be returned by the order they are stored on disk or may be unordered, depending on how the query engine handles data  If you need data in a particular order, use an dd t i ti l d ORDER BY clause  Some databases may return data presorted by Primary key or Clustered index  DO NOT DEPEND on this behavior  It is not reliably portable across different databases  Do not make a program rely on a behavior not specified by the program – bad programming style
  • 21. What Wh t are Indexes I d  Indexes are a pre-sorted list of database field values  This list speeds up queries A LOT  An index is much smaller than the full recordset  An index tracks only the fields specified when created
  • 22. Indexes and P f I d d Performance  Indexes GREATLY speed up queries on the fields being indexed  Indexes SLIGHTLY slow d I d l down INSERTS INSERTS, UPDATES and DELETES  The indexes need to be updated anytime a record changes  This adds a small bit of overhead to the system  The increase in query speed usually greatly offsets the slight extra load on INSERTS, UPDATES and DELETES  Most database use is typically reading (instead of writing)
  • 23. Types of Indexes T fI d  Simple Index  Indexesthe field specified  Can have multiple simple indexes  Speeds up queries using the indexed field  Composite Index  Indexescombinations of fields  Can have multiple composite indexes  Speeds up queries using the specified combination of fi ld bi ti f fields
  • 24. Simple Indexes Si l I d  An Index contains a list of all field values and pointers to the record with that value p
  • 25. How Indexes help SELECTs p
  • 26. Composite I d C it Indexes  Useful only when you will be querying multiple fields simultaneously  Most selective (resulting in the fewest records matching) fields should be listed first  Slight performance gain over multiple Simple Indexes when used correctly  No performance gain (possibly even a performance loss) when used incorrectly.
  • 28. Searching I d S hi Indexes  There are high performance algorithms for high-performance searching pre-sorted data I d Index data stored i t d t t d internally as bi ll binary t trees (typically)  Searching through binary tree much faster than reading through table
  • 29. Design Strategy Make it work Make it fast Make it pretty p y
  • 30. Focus on what’s needed F h t’ d d  Make your query as specific as possible  Makes joins simpler (and therefore faster)  Returns no unwanted data  Increases response time  Reduces network and server load  Put as much as possible in the WHERE clause  J i your fi ld properly Join fields l
  • 31. Maximum S l ti it / S M i Selectivity Specificity ifi it  If differents parts of the WHERE clause will result in smaller data sets than other ones, list them the ones that will most g greatly reduce the data first y  List the others in the same order
  • 33. Horizontal P titi i H i t l Partitioning  Avoid extremely “wide” records (records with lots of fields) )  Split data into two tables with a common ID field  All the record data is rarely used in all queries, queries so it reduces traffic and speeds up processing and disk reads
  • 34. Add I d Indexes  Make sure all fields searched on regularly are indexed
  • 35. Define Clustered Index D fi Cl t d I d  The most-likely criteria to be sorted on should be defined as your clustered index y  Clustered index defines the order the records will physically be stored on disk
  • 36. Normalize D t N li Data  No redundant data  Link related data  Don’t go crazy
  • 37. Denormalize D t D li Data  If joining more 4 tables or more in a single q y, query, consider de-normalizing data g (combining data from external tables back into the main table))  Can return faster query results  Risks include having to manage duplicate data in database and larger disk usage
  • 38. Size records to fit database page size  SQL Server 7.0, 2000, and 2005 data pages are 8K (8192 bytes) in size.  Of the th 8192 available b t i each d t page, only il bl bytes in h data l 8060 bytes are used to store a row. The rest is used for overhead.  If you have a row that was 4031 bytes long, then only one row could fit into a data page, and the remaining 4029 bytes left in the page would be empty y p g py  Making each record 1 byte shorter would halve the disk access required to read the table
  • 39. Use Primary K U aPi Key  Make sure you have a primary key defined  Ideally, it should be a single unique field  You can define composite keys, but they are generally not needed if the database is designed p p y properly  Most tables should have a unique TableNameID field  This allows you to identify any row by a single unique value
  • 40. Use U an IDENTITY C l Column  If there is:  no Primary Key on the table  no unique index on the table  Then  Add an IDENTITY (unique value) column to the table  Optionally, index the IDENTITY column if you will query it regularly
  • 41. Move TEXT, NTEXT, and IMAGE data into table  These types normally stored outside the table (uses a pointer to the data in the ( p field)  If these types will be searched frequently frequently, consider moving them into the database table s table’s storage with: sp_tableoption 'tablename', 'text in row', 'on' or sp_tableoption 'tablename', 'text in row', 'size'
  • 42. Consider R li ti i advance C id Replication in d  If Replication is to be used, factor the decision into the original design of the g g database
  • 43. Use Built-in Referential I t it U B ilt i R f ti l Integrity  Use foreign keys and validation constraints built into the database  Don’t manage it in the application  Benefits:  Faster execution  Can’t mess it up with application errors
  • 44. Re-test h R t t when changing servers h i  Retest and rebenchmark applications when moving to a new server, such as:  Development  Staging  Production  Problems often caused b P bl ft d by:  More rows in test data on servers “closer” to production  Server configurations different  Tables not indexed the same way
  • 45. Constraints are F t C t i t Fast  Constraints on fields are faster than  Triggers  Rules  Defaults
  • 46. Don’t duplicate ff t D ’t d li t effort  Don’t check for the same thing twice  (duh, but it happens)  Don’t use a trigger and a constraint to do the same thing g  Same for constraints and defaults  Same for constraints and rules
  • 47. Limit Li it records t b JOIN d d to be JOINed  Use WHERE clause to minimize rows to be JOINed.  Particularly in the OUTER table of an OUTER JOIN
  • 48. Index F i K I d Foreign Keys  Fields in a table that are a foreign key are not indexed automatically j y just because they are a foreign key.  Add these indexes manually
  • 49. Minimize Duplicate Mi i i D li t JOIN fi ld d t field data  JOINs are slower when there are few different keys in the j y joining table g
  • 50. JOIN on unique indexed fi ld i i d d fields  JOINs will perform fastest when joining indexed fields with no duplicate data p
  • 51. JOIN Numeric Fi ld N i Fields  JOINs on numeric fields perform much faster than JOINs on other datatype fields. yp
  • 52. JOIN th exact same d t t the t datatypes  JOINs should be to the exact same datatype for best p yp performance  Same type of field  Same field length  Same encoding (ASCII vs. Unicode, etc)
  • 53. Use U ANSI JOIN syntax t  Improves readability  Less likely to cause programmer errors  No Aliases: SELECT fname, lname, department FROM names INNER JOIN departments ON names.employeeid names employeeid = departments employeeid departments.employeeid  Microsoft syntax example: SELECT fname, lname, department FROM names departments names, WHERE names.employeeid = departments.employeeid  Code is more portable between databases
  • 54. Use Table Aliases U T bl Ali  Shortens code  Makes it easier to follow, especially with long queries  Identifies which table each field is coming from  No table aliases: SELECT fname, lname, department FROM names INNER JOIN departments ON names.employeeid = departments.employeeid  Microsoft syntax example: SELECT N.fname, N.lname, D.department FROM names N INNER JOIN departments D ON N.employeeid = D.employeeid
  • 55. Don’t D ’t use SELECT *  Requires additional parsing on the server to extract field names  Returns unneccesary data (unless you are actually using every field)  Returns duplicate data on JOINs (do you really need two copies of the RecordID field?)  Can cause errors (in some databases) if JOINed tables have fields with the same name
  • 56. Store i separate fil in fil St in t files i filegroup  For very large joins placing tables to be j joined in separate p y p physical files within the same filegroup can improve performance  SQL Server can spawn separate threads for processing each file
  • 57. Don’t D ’t use CROSS JOINs JOIN  Unless actually needed (rarely) do not use a CROSS JOIN (returns all combinations ( of records on both sides of JOIN)  People sometimes will do a CROSS JOIN and then use DISTINCT and GROUP BY to eliminate all the duplication  Don’t do that.
  • 58. JOINs JOIN vs. Subquery S b  Depending on the specifics, either could be faster. Write both and test performance to be sure which is best.  JOIN: SELECT a.* FROM Table1 a INNER JOIN Table2 b ON a.Table1ID = b.Table1ID WHERE b.Table2ID = 'SomeValue'  Subquery: SELECT a.* FROM Table1 a WHERE Table1ID IN a. (SELECT Table1ID FROM Table2 WHERE Table2ID = 'SomeValue')
  • 59. Avoid Subqueries unless needed A id S b i l d d  Avoid subqueries unless actually required  Most subqueries can be expressed as JOINs  Subqueries are generally harder to read and understand (for humans) and, therefore, therefore maintain
  • 60. Use an Indexed View (Enterprise 2000 and later only)  An Indexed View maintains an updated record of how the tables are joined via a j clustered index  This slows INSERTs, UPDATEs and INSERTs DELETEs a bit, so consider the tradeoff for the faster queries
  • 61. Use Database’s Performance Database s Optimization Tools  Most major databases have methods for monitoring and optimizing database g p g performance  Google “databaseName optimizing” for databaseName optimizing tons of links
  • 62. Avoid A id DISTINCT when possible h ibl  DISTINCT clauses are frequently (and usually unintentionally) used to hide an incorrect JOIN  Properly normalized database will not frequently need DISTINCT clauses  Look for incorrect JOINs or create more explicit WHERE clauses to avoid needing DISTINCT  Of course, use it when appropriate
  • 63. The End (for now) Google: Query Optimization for lots more information