1. Ten Tips for Writing Efficient SQL By Robert Bright
Abstract Tip #3 Tip #7
Use OR instead of UNION on the same table 6) Use IN instead of EXISTS
As a Web Developer at the Ontario Universities’ Application Center (OUAC), I worked a lot with SQL and
When selecting data from a single table that requires a logical or, it is easier to view the process of the query by using an UNION. A simple trick to increase the speed of an EXISTS sub query is to replace it with IN. The IN method is faster than EXISTS
database programming. I learned several techniques to write SQL statements that were increased in
This method is inefficient because it requires an unnecessary intermediate table. By joining the inner query with the outer query because it doesn’t check unnecessary rows in the comparison.
efficiency. The intention of this presentation is to share the techniques I learned for writing efficient SQL
through an OR, it will eliminate the extra sub query and intermediate table. Example: One of the options for the degree listing program I wrote at OUAC was to list all the available degrees at a
statements so that future co-op student can benefit from this knowledge.
Example: While creating a tool that modified the help pages dynamically at OUAC, I needed to find a specific file that belonged to a specific University. So if I were checking for U of Guelph, I would look for all the degrees that were associated with the
University. I was tempted to use an UNION to find the exact data, but an OR proved to be more efficient. university number 149. By replacing the EXISTS in the sub query with an IN, I made the query more efficient.
Before: SELECT hemenbr, hename FROM buma.helpfiles WHERE hemenbr = 5 UNION Before: select cgrfnbr from category where EXISTS (select cpcgnbr from cgprrel where cpprnbr = 149 )
SELECT hemenbr, henam FROM buma.helpfiles WHERE hename = 'help_address.html' After: select cgrfnbr from category where cgrfnbr IN (select cpcgnbr from cgprrel where cpprnbr = 149 )
Information about the Employer After: SELECT DISTINCT hemenbr, hename FROM buma.helpfiles WHERE hemenbr = 5 OR hename = 'help_address.html'
The Ontario Universities’ Application Centre (OUAC), located in Guelph, Ontario, Canada, is a central
36%
Queries
Before
17%
After
bureau whose key function is the processing of applications for admission to the province’s universities.
Queries
Before
After
Time Reduction
Time Reduction 0 1 2 3
Time in ms
4 5 6
Job Description
0 5 10 15 20 25
Time in ms
I worked at OUAC as a Web Developer. I developed web page to improve the usability of the Ontario
University application process. I spent the majority of my time creating two internal systems. The first
Tip #4 Tip #8
was created with the purpose to allow employees of OUAC to modify the contents of the help files Use EXISTS instead of LEFT JOIN Avoid including a HAVING clause in SELECT statements
without having to know any programming or HTML skills. The second system created lists of degree The LEFT JOIN merges the outer query with the inner query and keeps the extra rows from the outer table. The same result can be obtained The HAVING statement is quite useless in a SELECT statement. It works by going though the final result table of the query any
programs available at universities. Users were now able to see where a program is taught all at once by using an EXISTS sub query. The will eliminate the need to compare two tables as the inner query acts as a filter when the outer query parsing out the rows that don’t meet the HAVING condition. Instead, you can put the condition inside the query with a WHERE
instead of having to search every university. I developed all the web sites and systems with HTML, executes. clause. This will be included in the creation of the table and will eliminate having to go back through the results a second time.
JavaScript, SQL, and the IBM scripting language Net.Data. Example: While creating a tool that modified the help pages dynamically at OUAC, I needed to find which Universities had help files Example: In the help file tool I created at OUAC, I had to select all the University numbers except for the one that belonged to the
associated with them. By using an EXISTS sub query instead of LEFT JOIN, I increased the efficiency of this query by avoiding a table test case. So I could cut out that row with a HAVING clause at the end of the statement, but a WHERE proved to be more efficient.
comparison. Before: select merfnbr from merchant group by merfnbr having merfnbr!=2
Purpose of this Report
Before: SELECT merfnbr, mestname FROM buma.merchant LEFT JOIN buma.helpfiles ON merfnbr=hemenbr After: select merfnbr from merchant where merfnbr!=2 group by merfnbr
After: SELECT merfnbr, mestname FROM buma.merchant WHERE EXISTS (SELECT * FROM buma.helpfiles where merfnbr = hemenbr)
On my co-op at OUAC I worked intensively with databases and SQL queries. I learned several techniques
to improve the sped and efficiently of the queries. The intention with this report to share this knowledge so
Queries
23% 26%
Before
Queries
Before
After
current and future co-op students will know how to write better SQL statements. After
Each technique was tested by running both the original query and improved query ten times each. I
recorded the average time of each query to show the speed increase of using the more efficient query.
0 5 10 15
Time in ms
20 25 30 Time Reduction 0 5 10
Time in ms
15 20 25
Time Reduction
Tip #1 Tip #5 Select all your data at once
Tip #9
Use BETWEEN instead of IN
The BETWEEN keyword is very useful for filtering out values in a specific range. It is much faster than typing each value in the range Each time a query is performed there is the overhead cost of have to open a connection to the database. Having many separate
Use Column Names Instead of * in a SELECT Statement queries that select data from the same table is very inefficient since each query adds its overhead cost to the execution time. By
If you are selecting only a few columns from a table there is no need to use SELECT *. Though this is into an IN.
putting all these queries into one, it will reduce the overhead cost significantly.
easier to write, it will cost more time for the database to complete the query. By selecting Example: While at OUAC I built a small webpage that displayed all possible degrees and their information. Each degree belonged to a
Example: When creating the help file tool at OUAC, I needed to retrieve lots of data on each file. I required the file name, the
only the columns you need, you are reducing the size of the result table and in turn grouped category. In the database the category numbers where in a specific range. So I was able to benefit from using a BETWEEN
content, the associated University, etc.. Having these selections as different queries proved to be very inefficient, so I put them
increasing the speed of the query. instead having each value inside an IN.
together into one statement.
Example: While creating a tool that modified the help pages dynamically at OUAC, I needed to get each Before: SELECT crpcgnbr FROM cgryrel WHERE crpcgnbr IN (508858, 508859, 508860, 508861,508862, 508863, 508864)
After: SELECT crpcgnbr FROM cgryrel WHERE crpcgnbr BETWEEN 508858 and 508864 Before: select hetitle, hename from helpfileswhere heshnbr=24;
file’s information from the database. By replacing the * in my query with the column select hecontent, hemenbr from helpfiles where heshnbr=24;
names, I increased the speed of the query. After: select hetitle, hename, hecontent, hemenbr from helpfiles where heshnbr=24;
Before: SELECT * FROM buma.helpfiles
After: SELECT heshnbr, hemenbr, hename, hetitle, hecontent, hefield1, hefield2
FROM buma.helpfiles
59%
Queries
Before
32%
After
Queries
Before
After
Time Reduction
34% 0 2 4 6 8 10 12
Time Reduction
Queries
Before
Time in ms
After
0 5 10 15 20 25 30
Time in ms
Time Reduction Tip #6
Tip #10
0 10 20 30 40 50
Time in ms
Minimize the number of sub queries
Tip #2
Each time a sub query is performed, I new result table must be created and then merged with the outer table. This takes a long time Remove any redundant mathematics
to perform this on a database. So it is important to minimize the amount of sub queries to speed up the results. There will be times where you will be performing mathematics within an SQL statement. They can be a drag on the performance if
Example: The degree listing program I made at OUAC was based on a very redundant database. All the relationships were put into written improperly. For each time the query find a row it will recalculate the math. So eliminating any unnecessary math in the
Use EXISTS instead of DISTINCT one of two tables. So sorting out the information was very difficult. The only method to get the data was to use several sub queries. statement will make it perform faster.
The DISTINCT keyword works by selecting all the columns in the table then parses out any duplicates. By simply removing one unnecessary sub query from this statement increased the speed significantly. Example: The degree listing program I created at OUAC has the option to show a specific range on Universities based on their
Instead, if you use sub query with the EXISTS keyword, you can avoid having to return an entire table Before: select cgsdesc, cgrfnbr from category where cgoid='degree' and cgrfnbr IN reference numbers. It was easier to show the users a single digit list then add 3000 to get the reference number. But having the
Example: While creating a tool that modified the help pages dynamically at OUAC, I needed to find which (select cpprnbr from cgprrel where cpprnbr IN (select cpcgnbr from cgprrel where cpprnbr IN addition inside the query was inefficient so I preformed the math outside it.
Universities had help files associated with them. By using an EXISTS sub query instead of DISTINCT, I (select prrfnbr from product where prrfnbr IN (select cpprnbr from cgprrel where cpcgnbr IN Before: SELECT merfnbr FROM buma.merchant WHERE merfnbr + 3000 < 5000;
increased the efficiency of this query. (select cgrfnbr from category where cgoid IS NULL)) and prrfnbr IN After: SELECT merfnbr FROM buma.merchant WHERE merfnbr < 2000;
Before: SELECT DISTINCT hetitle, hename (select cpprnbr from cgprrel where cpcgnbr = 190200))))
FROM buma.helpfiles h , buma.merchant m WHERE m.merfnbr = h.hemenbr After: select cgsdesc, cgrfnbr from category where cgoid='degree' and cgrfnbr IN
After: SELECT hetitle, hename FROM buma.helpfiles h WHERE EXISTS (select cpprnbr from cgprrel where cpprnbr IN(select cpcgnbr from cgprrel where cpprnbr IN
(SELECT m.merfnbr FROM buma.merchant m) (select prrfnbr from product where prrfnbr IN (select cpprnbr from cgprrel where cpcgnbr = 572191)
11%
Queries
and prrfnbr IN (select cpprnbr from cgprrel where cpcgnbr = 190200))))
Before
After
Time Reduction
48% 41%
14 15 16
Time in ms
17 18
Queries
Queries
Before Before
After After
Time Reduction Time Reduction Summary
0 10 20 30 40 50 0 10 20 30 40
Time in ms Time in ms
The purpose of this report was to share the knowledge I gained about writing efficient SQL from my co-op as a web developer
at OUAC. Increasing the speed of queries is very important is web development as web pages are viewed thousands of times
per day and therefore a simple increase in speed of a SQL query can create a greater speed in web page viewing.