College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
Semi join
1. PRESENTATION ON :
SEMIJOIN – IT’S EFFECTIVENESS
IN DISTRIBUTED ENVIRONMENT
PRESENTED BY –
Alokeparna Choudhury,
UNIVERSITY INSTITUTE OF TECHNOLOGY
(THE UNIVERSITY OF BURDWAN)
ROLL NO.: ME201310005
REGISTRATION NO.: 2783 of 2009-10
2. Contents
1. Distributed system
2. Relational algebra
3. Join
4. Semi-join
5. Simple semi-join example
6. How oracle evaluates EXISTS and IN
7. Examples
8. Effectiveness
9. References
3. What is distributed database system?
A distributed database system is characterized by
the distribution of the system components of
hardware ,control and data.
A distributed system is a collection of independent
computers interconnected via point-to-point
communication lines.
what is distributed query processing?
The retrieval of data from different sites in a network is
known as distributed query processing.
4. Relational Algebra
The Relational Algebra is used to define the
ways in which relations (tables) can be operated
to manipulate their data.
This Algebra is composed of Unary operations
(involving a single table) and Binary operations
(involving multiple tables).
Join, Semi-join these are Binary operations in
Relational Algebra.
5. Join
Join is a binary operation in Relational Algebra.
It combines records from two or more tables in a
database.
A join is a means for combining fields from two
tables by using values common to each.
6. Semi-Join
•A Join where the result only contains the columns
from one of the joined tables.
•Useful in distributed databases, so we don't have to
send as much data over the network.
•Can dramatically speed up certain classes of
queries.
7. What is “Semi-Join” ?
Semi-join strategies are technique for query processing in
distributed database systems.
Used for reducing communication cost.
A semi-join between two tables returns rows from the first
table where one or more matches are found in the second
table.
The difference between a semi-join and a conventional
join is that rows in the first table will be returned at most
once. Even if the second table contains two matches for a
row in the first table, only one copy of the row will be
returned.
Semi-joins are written using EXISTS or IN.
8. A Simple Semi-Join Example
“Give a list of departments with at least one employee.”
Query written with a conventional join:
SELECT D.deptno, D.dname
FROM dept D, emp E
WHERE E.deptno = D.deptno
ORDER BY D.deptno;
◦ A department with N employees will appear in the list
N times.
◦ We could use a DISTINCT keyword to get each
department to appear only once.
9. A Simple Semi-Join Example
“Give a list of departments with at least one employee.”
Query written with a semi-join:
SELECT D.deptno, D.dname
FROM dept D
WHERE EXISTS
(SELECT 1
FROM emp E
WHERE E.deptno = D.deptno)
ORDER BY D.deptno;
◦ No department appears more than once.
◦ Oracle stops processing each department as soon
as the first employee in that department is found.
10. How Oracle Evaluates EXISTS
and IN
Oracle transforms the sub query into a join if at all
possible.
Oracle does not consider cost when deciding whether or
not to do this transformation.
Oracle can perform a semi-join in a few different ways:
◦ Semi-join access path.
◦ Conventional join access path followed by a sort to remove
duplicate rows.
◦ Scan of first table with a filter operation against the second
table.
11. How Oracle Evaluates EXISTS
and IN
Rule of thumb from the Oracle 8i/9i/10g
Performance Tuning Guide (highly simplified):
◦ Use EXISTS when outer query is selective.
◦ Use IN when sub-query is selective and outer
query is not.
◦ Rule of thumb is valid for Oracle 8i.
◦ Oracle 9i often does the right thing regardless of
whether EXISTS or IN is used.
12. Semi-Join Example #1
Query: A class list of each course is to be prepared.
At Site1: STUDENT(std_id, std_name)
At Site2: REGISTRATION(std_id, course_id)
Steps of Semi-join:
1. Project REGISTRATION on std_id.
X=∏std_id(REGISTRATION)
2. Transmit X to Site1.
3. At Site1, select those topples of STUDENT that have the
same value for std_id as a topple in
∏std_id(REGISTRATION) by a Join.
Y= STUDENT REGISTRATION = STUDENT X
13. Semi-Join Example #1
(After the previous 3 steps we get only registered students)
4. Send Y to Site2 and then Join with REGISTRATION.
Now we get the complete final result i.e. the class list
of all students on a particular course.
14. Semi-Join Example #2
Query: “List the gold-status customers who have
placed an order within the last three days.”
SELECT DISTINCT C.short_name, C.customer_id
FROM customers C, orders O
WHERE C.customer_type = 'Gold'
AND O.customer_id = C.customer_id
AND O.order_date > SYSDATE - 3
ORDER BY C.short_name;
15. Semi-Join Example #2
What we see on the previous slide:
◦ The query was written with a conventional join
and DISTINCT instead of an EXISTS or IN
clause.
◦ Oracle performed a conventional join followed by
a sort for uniqueness in order to remove the
duplicates. (18 of the 20 rows resulting from the
join were apparently duplicates.)
◦ The query took 6.70 seconds to complete and
performed 33,608 logical reads.
16. Semi-Join Example #2
Rewritten with a semi-join:
SELECT C.short_name, C.customer_id
FROM customers C
WHERE C.customer_type = 'Gold'
AND EXISTS
(
SELECT 1
FROM orders O
WHERE O.customer_id = C.customer_id
AND O.order_date > SYSDATE - 3
)
ORDER BY C.short_name;
17. Semi-Join Example #2
What we see on the previous slide:
◦ An EXISTS clause was used to specify a semi-
join.
◦ Oracle performed a semi-join instead of a
conventional join. This offers two benefits:
Oracle can move on to the next customer
record as soon as the first matching order
record is found.
There is no need to sort out duplicate records.
◦ The query took 6.42 seconds to complete and
performed 33,608 logical reads.
18. Effectiveness of Semi-join
The semi-join is the special way of joining data from multiple tables
in a query.
We use EXISTS, IN clauses to denote this special type of Join.
Oracle has special access paths that can make semi-joins extremely
efficient.
Understanding how semi-join works—and how Oracle implements
the relevant data access paths—will enable us to write very efficient
SQL and dramatically speed up an entire class of queries.
Semi-join does not give the final result still it is efficient in reducing
communication cost than conventional Join.
19. References
1. Using 2-way semijoin in distributed query processing.
By Hyunchul Kang and Nick Roussopoulos.
2. Improving distributed query processing by hash-
semijoins. By Judy Tseng and Arbee Chen.
3. Domain Specific Semijoin:A new operation for
distributed query processing. By Jason Chen and
Victor Li.
4. Composite Semijoin in distributed query processing.
By William Perrizio and Chun Chen