SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
An Agent-based Adaptive
Join Algorithm for Distributed
Data Warehousing
1ST ANNUAL POSTGRADUATE RESEARCH STUDENT CONFERENCE
Qicheng Yu FoC of London Metropolitan University
Dr. Fang Fang Cai FoC of London Metropolitan University
Dr. Julie A McCann DoC of Imperial College
Supervisors:
Introduction
Data warehousing continues to play an
important role in global information systems for
businesses.
Applications of data warehousing have evolvedApplications of data warehousing have evolved
from reporting and decision support systems to
mission critical decision making systems.
This requires data warehouses to combine both
historical and current data from operational
systems.
Introduction (cont’d)
The primary objective of this research is to seeking
effective and efficient join techniques for distributed
data warehousing. This is because:
Joins are a frequently occurring operation in DW
queries.queries.
Joins are one of the most expensive operations that a
DW performs.
Joining two large tables could consume a significant
amount of the system’s CPU cycles, disk or network
bandwidth, and buffer memory.
Traditional join algorithms caused significant
performance issues in a distributed and dynamic
network environment .
Theories and Techniques
Software Agents software entities that have an internal
goal and acts on behalf of a user.
Autonomy
AdaptivenessAdaptiveness
Collaboration
Mobility
They are well suited to applications which involve
distributed computation or communication between
components, sensing or monitoring of their environment, or
autonomous operation.
Theories and Techniques
Adaptive Join Algorithms focus on using runtime
feedback to modify query processing in a way that provides
better response time or more efficient CPU utilization. Good
examples are:
Nested Loop Ripple JoinNested Loop Ripple Join
Hash Ripple Join
XJoin.
Distributed Query Technique Semi-join was
introduced for reducing data transmission in processing
distributed queries. It has been traditionally used to great
advantage in distributed systems.
The Main Idea of AJoin
The main idea of AJoin is based on software agents
and extend the ripple join and semi-join techniques into a
multi-agent system where a join task is decomposed into
smaller independent units which will be assigned to
software agents.software agents.
It takes data warehousing features into account
It utilises intelligent software agents for the dynamic
optimisation and coordination of join processing.
AJoin Framework
S
R1AIndex R2AIndex
R1S R2S
Join Results
Agent coordinator
R1 R2
Filter C1 Filter C2
S1 S2
R1 Input
Buffer
R1S R2S
R1S Results
Buffer
2 2
3
3
4 4
5 5
1
1
R2S Results
Buffer
R2 Input
Buffer
AJoin Algorithm
Join coordinate at local site S:
1.Initialization.
Perform cost-benefit analysis;
Allocate initial memory and buffer.
2. Activate RIA at sites S1 and S2 (see table 4)
Start Remote Adapting
3. Activate MTA and OA at site S (see table 2)
4. Coordinate and monitor join process
5. Iterate from step 4 until join completed
6. Deactivate MTA, RIA, and OA
7. End AJoin
Table1 - Join Coordinator Agent (JCA)
AJoin Algorithm (cont’d)
MTA for R1 at local site S:
1. Receive a tuple from IB (see table 3)
2. If a tuple is received from R1
3. If full-join method is selected
4. Save R1S at local site S and add its LAP and R1AS A
to index table for R1A
5. Else // semi-join is selected
6. Add negative RAP and R1A to index table for R1A
7. Using R1A to probe R2A in index table for R2A
8. If a match is found
9. For each matched R1A and R2A
TABLE 2-1. TUPLE MATCHING AGENT (MTA)
AJoin Algorithm (cont’d)
MTA for R1 at local site S:
10. If the access pointer of RS is RAP
11. Call RIA at S1 and/or S2 (see table 5)
to retrieve R1S and/or R2S respectivelyS S
12. Else // RS accessible from the local storage
13. Retrieve R1S and/or R2S locally
14. Output the matched R1S and R2S
15. Iterate from step 1 until no more tuples from R1
16.Notify JCA to end the process
TABLE 2-2. TUPLE MATCHING AGENT (MTA)
AJoin Algorithm (cont’d)
Receive tuple procedure at local site S:
1. If all tuples from both relation R1 and R2 are received
2. Join completed
3. If both R1 and R2 buffer are not empty
4. Retrieve a tuple from R1 and R2 IB in turn
5. Else5. Else
6. If both R1 and R2 IB are empty
7. Waiting tuples to arrive
8. Else // one of R1 orR2 IB is not empty
9. If R1 IB is not empty
10. Retrieve a tuple from R1 IB
11. Else // R2 IB is not empty
12. Retrieve a tuple from R2 IB
13. Return
TABLE 3. RECEIVE A TUPLE PROCEDURE
AJoin Algorithm (cont’d)
Tuple filter and join adapting at remote site S1:
1. Receive join parameter (C and join selection threshold)
2. Retrieve a tuple from R1
3. If the tuple is not meet the condition C1
4. Repeat step 2
5. Evaluate join cost-benefit
6. If semi-join is selected
7. Send R1A and its remote access pointer to R1 IB at site S
8. Else // full-join is selected
9. Send R1A and R1s to R1 IB at site S
10. Iterate from step 2 until no more tuples from R1
TABLE 4. REMOTE INFORMATION AGENT (RIA)
AJoin Algorithm (cont’d)
Retrieve RS at remote site S1 (or S2):
1. Get the remote access pointer
2. Retrieve the tuple according to its access pointer
3. Return R to R buffer at site S3. Return RS to RS buffer at site S
TABLE 5. RIA RETRIEVE RS SERVICE
Cost and Benefit Analysis
cost of full ripple join:
Cost F (R1⋈AR2) = (1)
where T0 is the cost to start-up a new network connection and V is the
network speed. |R1| and |R2| to indicate the cardinality of R1 and R2
respectively,
⋈
0
| 1| ( 1) | 2| ( 2)R Size R R Size R
T V
× + ×
+
cost of semi ripple join:
CostS (R1⋈AR2) =
(2)
where Size(P) is the size in byte of the RAP.
0
| 1| ( 1 ) ( 1) ( ( 1 ) ( ))
| 1| ( 2 ) ( 2) ( ( 2 ) ( ))
A S
A S
R Size R JC R Size R Size P
T V
R Size R JC R Size R Size P
V
× + × +
× + × +
+
+
Cost and Benefit Analysis
Cost and Benefit :
Cost F (R1⋈AR2) - CostS (R1⋈AR2) =
(3)
| 1| ( 1) | 1| ( 1 ) ( 1) ( ( 1 ) ( ))
| 2| ( 2) | 1| ( 2 ) ( 2) ( ( 2 ) ( ))
A S
A S
R Size R R Size R JC R Size R Size P
V
R Size R R Size R JC R Size R Size P
V
× − × − × +
× − × − × +
+
In AJoin, full-join or semi-join can be applied independently to
each relation. The semi-join method will be chosen only
when:
(4)
or CR(R) × SR(R) < 1 (5)
where CR(R) = denotes as join cardinality ratio as %;
SR(R) = denotes as attribute selection ratio as %
V+
| | ( ) | | ( ) ( ) ( ( ) ( ))
0
A SR Size R R Size R JC R Size R Size P
V
× − × − × +
>
( )
| |
JC R
R
( ) ( )
( ) ( )
S
A
Size R Size P
Size R Size R
+
−
AJoin for Data Warehousing
Multiple one-to-many joins between the fact table and
the dimensional tables.
referential integrity constraints are applied e.g. all join
attributes in fact table must have a matching joinattributes in fact table must have a matching join
attributes in the dimensional tables. Therefore,
CR(R1) =100% full-join method will be used for
R1 unless SR(R1) is low.
run-time data of the fact table only represents a small
portion of the fact table. Therefore, CR(R2) << 100%
the semi-join method should be chosen for R2.
Cost and Benefit in High
Bandwidth Network
The join cost of local processing cannot be ignored any more. But, since
the cost of joins at a local site using full or semi-join method is at an
equivalent level.
The semi-join method will be chosen only when:
| | ( ) | | ( ) ( ) ( ( ) ( ))R Size R R Size R JC R Size R Size P× − × − × +
(6)
V< (7)
where V’ is the desk data access speed and V is the join switch threshold.
| | ( ) | | ( ) ( ) ( ( ) ( ))
( ) ( )
'
A S
S
R Size R R Size R JC R Size R Size P
V
JC R Size R
V
× − × − × +
×
>
| | ( ) | | ( ) ( ) ( ( ) ( ))
' ( ) ( )
A S
S
R Size R R Size R JC R Size R Size P
V JC R Size R
× − × − × +
× ×
Performance Evaluation
Join under 250K-2Mbps network
200
250
300
Time(sec)
0
50
100
150
10
30
50
70
90
110
130
150
170
190
210
230
250
Matched Tuples
Time(sec)
AJoin Hash Ripple Join
Performance Evaluation (cont’d)
Join under 250K-2Mbps network
with 1 sec delay per 1000 tuples
300
350
400
0
50
100
150
200
250
300
10
30
50
70
90
110
130
150
170
190
210
230
250
Matched Tuples
Time(sec)
AJoin Hash Ripple Join
Performance Evaluation (cont’d)
Join under 100M-1Gbps network
25
30
35
0
5
10
15
20
25
10
30
50
70
90
110
130
150
170
190
210
230
250
Matched Tuples
Time(sec)
AJoin Hash Ripple Join
Performance Evaluation (cont’d)
Join under 100M-1Gbps network
with 1 sec delay per 1000 tuples
100
120
0
20
40
60
80
10
30
50
70
90
110
130
150
170
190
210
230
250
Matched tuples
time(sec)
AJoin Hash Ripple Join
Summary
The main work undertaken is summarized below:
investigation the feasibility and effectiveness of utilising
software agent technology to address some specific
issues in data warehouses.
conduction an experimental study on the performance
of modern join algorithm for distributed environment.
implement of a framework for an adaptive join algorithm
using intelligent agents.
evaluation the agent-based join approach against
current approaches in distributed and dynamic data
warehouse environments.
Questions & Comments

Mais conteúdo relacionado

Mais procurados

Sensor Fusion Study - Ch8. The Continuous-Time Kalman Filter [이해구]
Sensor Fusion Study - Ch8. The Continuous-Time Kalman Filter [이해구]Sensor Fusion Study - Ch8. The Continuous-Time Kalman Filter [이해구]
Sensor Fusion Study - Ch8. The Continuous-Time Kalman Filter [이해구]AI Robotics KR
 
The extended kalman filter
The extended kalman filterThe extended kalman filter
The extended kalman filterMudit Parnami
 
Seminar On Kalman Filter And Its Applications
Seminar On  Kalman  Filter And Its ApplicationsSeminar On  Kalman  Filter And Its Applications
Seminar On Kalman Filter And Its ApplicationsBarnali Dey
 
Understanding kalman filter for soc estimation.
Understanding kalman filter for soc estimation.Understanding kalman filter for soc estimation.
Understanding kalman filter for soc estimation.Ratul
 
VLDB 2018 presentation paper title: Streaming Graph Partitioning
VLDB 2018 presentation paper title: Streaming Graph PartitioningVLDB 2018 presentation paper title: Streaming Graph Partitioning
VLDB 2018 presentation paper title: Streaming Graph PartitioningZainab Abbas
 
a technical review of efficient and high speed adders for vedic multipliers
a technical review of efficient and high speed adders for vedic multipliersa technical review of efficient and high speed adders for vedic multipliers
a technical review of efficient and high speed adders for vedic multipliersINFOGAIN PUBLICATION
 
Kalman filter - Applications in Image processing
Kalman filter - Applications in Image processingKalman filter - Applications in Image processing
Kalman filter - Applications in Image processingRavi Teja
 
Implementation of Low Power and Area Efficient Carry Select Adder
Implementation of Low Power and Area Efficient Carry Select AdderImplementation of Low Power and Area Efficient Carry Select Adder
Implementation of Low Power and Area Efficient Carry Select Adderinventionjournals
 
Kalman Filter and its Application
Kalman Filter and its ApplicationKalman Filter and its Application
Kalman Filter and its ApplicationSaptarshi Mazumdar
 
Report kalman filtering
Report kalman filteringReport kalman filtering
Report kalman filteringIrfan Anjum
 
32-bit unsigned multiplier by using CSLA & CLAA
32-bit unsigned multiplier by using CSLA &  CLAA32-bit unsigned multiplier by using CSLA &  CLAA
32-bit unsigned multiplier by using CSLA & CLAAGanesh Sambasivarao
 
Low power & area efficient carry select adder
Low power & area efficient carry select adderLow power & area efficient carry select adder
Low power & area efficient carry select adderSai Vara Prasad P
 
Design and Verification of Area Efficient Carry Select Adder
Design and Verification of Area Efficient Carry Select AdderDesign and Verification of Area Efficient Carry Select Adder
Design and Verification of Area Efficient Carry Select Adderijsrd.com
 

Mais procurados (20)

Sensor Fusion Study - Ch8. The Continuous-Time Kalman Filter [이해구]
Sensor Fusion Study - Ch8. The Continuous-Time Kalman Filter [이해구]Sensor Fusion Study - Ch8. The Continuous-Time Kalman Filter [이해구]
Sensor Fusion Study - Ch8. The Continuous-Time Kalman Filter [이해구]
 
The extended kalman filter
The extended kalman filterThe extended kalman filter
The extended kalman filter
 
Seminar On Kalman Filter And Its Applications
Seminar On  Kalman  Filter And Its ApplicationsSeminar On  Kalman  Filter And Its Applications
Seminar On Kalman Filter And Its Applications
 
Understanding kalman filter for soc estimation.
Understanding kalman filter for soc estimation.Understanding kalman filter for soc estimation.
Understanding kalman filter for soc estimation.
 
VLDB 2018 presentation paper title: Streaming Graph Partitioning
VLDB 2018 presentation paper title: Streaming Graph PartitioningVLDB 2018 presentation paper title: Streaming Graph Partitioning
VLDB 2018 presentation paper title: Streaming Graph Partitioning
 
Real Time Geodemographics
Real Time GeodemographicsReal Time Geodemographics
Real Time Geodemographics
 
Kalmanfilter
KalmanfilterKalmanfilter
Kalmanfilter
 
Basics Of Kalman Filter And Position Estimation Of Front Wheel Automatic Stee...
Basics Of Kalman Filter And Position Estimation Of Front Wheel Automatic Stee...Basics Of Kalman Filter And Position Estimation Of Front Wheel Automatic Stee...
Basics Of Kalman Filter And Position Estimation Of Front Wheel Automatic Stee...
 
a technical review of efficient and high speed adders for vedic multipliers
a technical review of efficient and high speed adders for vedic multipliersa technical review of efficient and high speed adders for vedic multipliers
a technical review of efficient and high speed adders for vedic multipliers
 
Kalman Filter
Kalman FilterKalman Filter
Kalman Filter
 
Kalman Filter Basic
Kalman Filter BasicKalman Filter Basic
Kalman Filter Basic
 
Kalman filter - Applications in Image processing
Kalman filter - Applications in Image processingKalman filter - Applications in Image processing
Kalman filter - Applications in Image processing
 
Implementation of Low Power and Area Efficient Carry Select Adder
Implementation of Low Power and Area Efficient Carry Select AdderImplementation of Low Power and Area Efficient Carry Select Adder
Implementation of Low Power and Area Efficient Carry Select Adder
 
Kalman Filter
 Kalman Filter    Kalman Filter
Kalman Filter
 
Kalman Filter and its Application
Kalman Filter and its ApplicationKalman Filter and its Application
Kalman Filter and its Application
 
Report kalman filtering
Report kalman filteringReport kalman filtering
Report kalman filtering
 
Kalman Equations
Kalman EquationsKalman Equations
Kalman Equations
 
32-bit unsigned multiplier by using CSLA & CLAA
32-bit unsigned multiplier by using CSLA &  CLAA32-bit unsigned multiplier by using CSLA &  CLAA
32-bit unsigned multiplier by using CSLA & CLAA
 
Low power & area efficient carry select adder
Low power & area efficient carry select adderLow power & area efficient carry select adder
Low power & area efficient carry select adder
 
Design and Verification of Area Efficient Carry Select Adder
Design and Verification of Area Efficient Carry Select AdderDesign and Verification of Area Efficient Carry Select Adder
Design and Verification of Area Efficient Carry Select Adder
 

Destaque (19)

Syahnaz mohdmokhter
Syahnaz mohdmokhterSyahnaz mohdmokhter
Syahnaz mohdmokhter
 
Melissa chanegriha
Melissa chanegrihaMelissa chanegriha
Melissa chanegriha
 
Md kanu
Md kanuMd kanu
Md kanu
 
Maria elfani
Maria elfaniMaria elfani
Maria elfani
 
Mohammad khaleq newaz
Mohammad khaleq newazMohammad khaleq newaz
Mohammad khaleq newaz
 
Eva amoako atta
Eva amoako attaEva amoako atta
Eva amoako atta
 
George perendia
George perendiaGeorge perendia
George perendia
 
Kennedy javuru
Kennedy javuruKennedy javuru
Kennedy javuru
 
Glyn robbins
Glyn robbinsGlyn robbins
Glyn robbins
 
George perendia
George perendiaGeorge perendia
George perendia
 
Daniel amund
Daniel amundDaniel amund
Daniel amund
 
Mediagruppen som kooperativ
Mediagruppen som kooperativMediagruppen som kooperativ
Mediagruppen som kooperativ
 
Janet bowstead
Janet bowsteadJanet bowstead
Janet bowstead
 
Brukarkonferens i Munkfors 2010
Brukarkonferens i Munkfors 2010Brukarkonferens i Munkfors 2010
Brukarkonferens i Munkfors 2010
 
Joycelyn adinkrah
Joycelyn adinkrahJoycelyn adinkrah
Joycelyn adinkrah
 
Jabeen shah
Jabeen shahJabeen shah
Jabeen shah
 
Deepthi ratnayake
Deepthi ratnayakeDeepthi ratnayake
Deepthi ratnayake
 
Cristina maxim
Cristina maximCristina maxim
Cristina maxim
 
Sara cannizzaro
Sara cannizzaroSara cannizzaro
Sara cannizzaro
 

Semelhante a Qicheng yu

SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...
SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...
SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...vtunotesbysree
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...IRJET Journal
 
Discrete-wavelet-transform recursive inverse algorithm using second-order est...
Discrete-wavelet-transform recursive inverse algorithm using second-order est...Discrete-wavelet-transform recursive inverse algorithm using second-order est...
Discrete-wavelet-transform recursive inverse algorithm using second-order est...TELKOMNIKA JOURNAL
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...RAHUL WAGAJ
 
A High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
A High Speed Transposed Form FIR Filter Using Floating Point Dadda MultiplierA High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
A High Speed Transposed Form FIR Filter Using Floating Point Dadda MultiplierIJRES Journal
 
IRJET- A Study on Algorithms for FFT Computations
IRJET- A Study on Algorithms for FFT ComputationsIRJET- A Study on Algorithms for FFT Computations
IRJET- A Study on Algorithms for FFT ComputationsIRJET Journal
 
IRJET- Asic Implementation of Efficient Error Detection for Floating Poin...
IRJET-  	  Asic Implementation of Efficient Error Detection for Floating Poin...IRJET-  	  Asic Implementation of Efficient Error Detection for Floating Poin...
IRJET- Asic Implementation of Efficient Error Detection for Floating Poin...IRJET Journal
 
Frequency response method to derive transfer function of servo motor using Qu...
Frequency response method to derive transfer function of servo motor using Qu...Frequency response method to derive transfer function of servo motor using Qu...
Frequency response method to derive transfer function of servo motor using Qu...IRJET Journal
 
Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...
Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...
Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...IRJET Journal
 
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...IRJET Journal
 
digitaldesign-2020-lecture10a-lc3andmips-beforelecture.pdf
digitaldesign-2020-lecture10a-lc3andmips-beforelecture.pdfdigitaldesign-2020-lecture10a-lc3andmips-beforelecture.pdf
digitaldesign-2020-lecture10a-lc3andmips-beforelecture.pdfDuy-Hieu Bui
 
Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check  Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check IJECEIAES
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_fariaPaulo Faria
 
Travelling Salesman Problem, Robotics & Inverse Kinematics
Travelling Salesman Problem, Robotics & Inverse KinematicsTravelling Salesman Problem, Robotics & Inverse Kinematics
Travelling Salesman Problem, Robotics & Inverse Kinematicsmcoond
 
Low Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed ArithmeticLow Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed ArithmeticIJERA Editor
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
 

Semelhante a Qicheng yu (20)

wcnc05
wcnc05wcnc05
wcnc05
 
SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...
SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...
SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...
 
DBMS CS3
DBMS CS3DBMS CS3
DBMS CS3
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
 
Discrete-wavelet-transform recursive inverse algorithm using second-order est...
Discrete-wavelet-transform recursive inverse algorithm using second-order est...Discrete-wavelet-transform recursive inverse algorithm using second-order est...
Discrete-wavelet-transform recursive inverse algorithm using second-order est...
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...
 
A High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
A High Speed Transposed Form FIR Filter Using Floating Point Dadda MultiplierA High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
A High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
 
IRJET- A Study on Algorithms for FFT Computations
IRJET- A Study on Algorithms for FFT ComputationsIRJET- A Study on Algorithms for FFT Computations
IRJET- A Study on Algorithms for FFT Computations
 
IRJET- Asic Implementation of Efficient Error Detection for Floating Poin...
IRJET-  	  Asic Implementation of Efficient Error Detection for Floating Poin...IRJET-  	  Asic Implementation of Efficient Error Detection for Floating Poin...
IRJET- Asic Implementation of Efficient Error Detection for Floating Poin...
 
Frequency response method to derive transfer function of servo motor using Qu...
Frequency response method to derive transfer function of servo motor using Qu...Frequency response method to derive transfer function of servo motor using Qu...
Frequency response method to derive transfer function of servo motor using Qu...
 
Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...
Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...
Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...
 
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...
 
digitaldesign-2020-lecture10a-lc3andmips-beforelecture.pdf
digitaldesign-2020-lecture10a-lc3andmips-beforelecture.pdfdigitaldesign-2020-lecture10a-lc3andmips-beforelecture.pdf
digitaldesign-2020-lecture10a-lc3andmips-beforelecture.pdf
 
J0166875
J0166875J0166875
J0166875
 
Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check  Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
 
Travelling Salesman Problem, Robotics & Inverse Kinematics
Travelling Salesman Problem, Robotics & Inverse KinematicsTravelling Salesman Problem, Robotics & Inverse Kinematics
Travelling Salesman Problem, Robotics & Inverse Kinematics
 
Low Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed ArithmeticLow Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed Arithmetic
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
 

Mais de LondonMet PGR Students (18)

Inflation targeting misfiring on development of housing market bubble
Inflation targeting misfiring on development of housing market bubbleInflation targeting misfiring on development of housing market bubble
Inflation targeting misfiring on development of housing market bubble
 
Robert gonouya
Robert gonouyaRobert gonouya
Robert gonouya
 
Janet bowstead poster
Janet bowstead posterJanet bowstead poster
Janet bowstead poster
 
Veronica azolukwam
Veronica azolukwamVeronica azolukwam
Veronica azolukwam
 
Deepthi ratnayake
Deepthi ratnayakeDeepthi ratnayake
Deepthi ratnayake
 
Veronica azolukwam
Veronica azolukwamVeronica azolukwam
Veronica azolukwam
 
Marta kaleta
Marta kaletaMarta kaleta
Marta kaleta
 
Tracy part
Tracy partTracy part
Tracy part
 
Shazaib butt
Shazaib buttShazaib butt
Shazaib butt
 
Ozlem edizel
Ozlem edizelOzlem edizel
Ozlem edizel
 
Thao nguyen
Thao nguyenThao nguyen
Thao nguyen
 
Mokhter syahnaz
Mokhter syahnazMokhter syahnaz
Mokhter syahnaz
 
Stevens claire
Stevens claireStevens claire
Stevens claire
 
Pan xiao
Pan xiaoPan xiao
Pan xiao
 
Mokhter syahnaz
Mokhter syahnazMokhter syahnaz
Mokhter syahnaz
 
Pan xiao
Pan xiaoPan xiao
Pan xiao
 
Golovko karina
Golovko karinaGolovko karina
Golovko karina
 
George perendia
George perendiaGeorge perendia
George perendia
 

Qicheng yu

  • 1. An Agent-based Adaptive Join Algorithm for Distributed Data Warehousing 1ST ANNUAL POSTGRADUATE RESEARCH STUDENT CONFERENCE Qicheng Yu FoC of London Metropolitan University Dr. Fang Fang Cai FoC of London Metropolitan University Dr. Julie A McCann DoC of Imperial College Supervisors:
  • 2. Introduction Data warehousing continues to play an important role in global information systems for businesses. Applications of data warehousing have evolvedApplications of data warehousing have evolved from reporting and decision support systems to mission critical decision making systems. This requires data warehouses to combine both historical and current data from operational systems.
  • 3. Introduction (cont’d) The primary objective of this research is to seeking effective and efficient join techniques for distributed data warehousing. This is because: Joins are a frequently occurring operation in DW queries.queries. Joins are one of the most expensive operations that a DW performs. Joining two large tables could consume a significant amount of the system’s CPU cycles, disk or network bandwidth, and buffer memory. Traditional join algorithms caused significant performance issues in a distributed and dynamic network environment .
  • 4. Theories and Techniques Software Agents software entities that have an internal goal and acts on behalf of a user. Autonomy AdaptivenessAdaptiveness Collaboration Mobility They are well suited to applications which involve distributed computation or communication between components, sensing or monitoring of their environment, or autonomous operation.
  • 5. Theories and Techniques Adaptive Join Algorithms focus on using runtime feedback to modify query processing in a way that provides better response time or more efficient CPU utilization. Good examples are: Nested Loop Ripple JoinNested Loop Ripple Join Hash Ripple Join XJoin. Distributed Query Technique Semi-join was introduced for reducing data transmission in processing distributed queries. It has been traditionally used to great advantage in distributed systems.
  • 6. The Main Idea of AJoin The main idea of AJoin is based on software agents and extend the ripple join and semi-join techniques into a multi-agent system where a join task is decomposed into smaller independent units which will be assigned to software agents.software agents. It takes data warehousing features into account It utilises intelligent software agents for the dynamic optimisation and coordination of join processing.
  • 7. AJoin Framework S R1AIndex R2AIndex R1S R2S Join Results Agent coordinator R1 R2 Filter C1 Filter C2 S1 S2 R1 Input Buffer R1S R2S R1S Results Buffer 2 2 3 3 4 4 5 5 1 1 R2S Results Buffer R2 Input Buffer
  • 8. AJoin Algorithm Join coordinate at local site S: 1.Initialization. Perform cost-benefit analysis; Allocate initial memory and buffer. 2. Activate RIA at sites S1 and S2 (see table 4) Start Remote Adapting 3. Activate MTA and OA at site S (see table 2) 4. Coordinate and monitor join process 5. Iterate from step 4 until join completed 6. Deactivate MTA, RIA, and OA 7. End AJoin Table1 - Join Coordinator Agent (JCA)
  • 9. AJoin Algorithm (cont’d) MTA for R1 at local site S: 1. Receive a tuple from IB (see table 3) 2. If a tuple is received from R1 3. If full-join method is selected 4. Save R1S at local site S and add its LAP and R1AS A to index table for R1A 5. Else // semi-join is selected 6. Add negative RAP and R1A to index table for R1A 7. Using R1A to probe R2A in index table for R2A 8. If a match is found 9. For each matched R1A and R2A TABLE 2-1. TUPLE MATCHING AGENT (MTA)
  • 10. AJoin Algorithm (cont’d) MTA for R1 at local site S: 10. If the access pointer of RS is RAP 11. Call RIA at S1 and/or S2 (see table 5) to retrieve R1S and/or R2S respectivelyS S 12. Else // RS accessible from the local storage 13. Retrieve R1S and/or R2S locally 14. Output the matched R1S and R2S 15. Iterate from step 1 until no more tuples from R1 16.Notify JCA to end the process TABLE 2-2. TUPLE MATCHING AGENT (MTA)
  • 11. AJoin Algorithm (cont’d) Receive tuple procedure at local site S: 1. If all tuples from both relation R1 and R2 are received 2. Join completed 3. If both R1 and R2 buffer are not empty 4. Retrieve a tuple from R1 and R2 IB in turn 5. Else5. Else 6. If both R1 and R2 IB are empty 7. Waiting tuples to arrive 8. Else // one of R1 orR2 IB is not empty 9. If R1 IB is not empty 10. Retrieve a tuple from R1 IB 11. Else // R2 IB is not empty 12. Retrieve a tuple from R2 IB 13. Return TABLE 3. RECEIVE A TUPLE PROCEDURE
  • 12. AJoin Algorithm (cont’d) Tuple filter and join adapting at remote site S1: 1. Receive join parameter (C and join selection threshold) 2. Retrieve a tuple from R1 3. If the tuple is not meet the condition C1 4. Repeat step 2 5. Evaluate join cost-benefit 6. If semi-join is selected 7. Send R1A and its remote access pointer to R1 IB at site S 8. Else // full-join is selected 9. Send R1A and R1s to R1 IB at site S 10. Iterate from step 2 until no more tuples from R1 TABLE 4. REMOTE INFORMATION AGENT (RIA)
  • 13. AJoin Algorithm (cont’d) Retrieve RS at remote site S1 (or S2): 1. Get the remote access pointer 2. Retrieve the tuple according to its access pointer 3. Return R to R buffer at site S3. Return RS to RS buffer at site S TABLE 5. RIA RETRIEVE RS SERVICE
  • 14. Cost and Benefit Analysis cost of full ripple join: Cost F (R1⋈AR2) = (1) where T0 is the cost to start-up a new network connection and V is the network speed. |R1| and |R2| to indicate the cardinality of R1 and R2 respectively, ⋈ 0 | 1| ( 1) | 2| ( 2)R Size R R Size R T V × + × + cost of semi ripple join: CostS (R1⋈AR2) = (2) where Size(P) is the size in byte of the RAP. 0 | 1| ( 1 ) ( 1) ( ( 1 ) ( )) | 1| ( 2 ) ( 2) ( ( 2 ) ( )) A S A S R Size R JC R Size R Size P T V R Size R JC R Size R Size P V × + × + × + × + + +
  • 15. Cost and Benefit Analysis Cost and Benefit : Cost F (R1⋈AR2) - CostS (R1⋈AR2) = (3) | 1| ( 1) | 1| ( 1 ) ( 1) ( ( 1 ) ( )) | 2| ( 2) | 1| ( 2 ) ( 2) ( ( 2 ) ( )) A S A S R Size R R Size R JC R Size R Size P V R Size R R Size R JC R Size R Size P V × − × − × + × − × − × + + In AJoin, full-join or semi-join can be applied independently to each relation. The semi-join method will be chosen only when: (4) or CR(R) × SR(R) < 1 (5) where CR(R) = denotes as join cardinality ratio as %; SR(R) = denotes as attribute selection ratio as % V+ | | ( ) | | ( ) ( ) ( ( ) ( )) 0 A SR Size R R Size R JC R Size R Size P V × − × − × + > ( ) | | JC R R ( ) ( ) ( ) ( ) S A Size R Size P Size R Size R + −
  • 16. AJoin for Data Warehousing Multiple one-to-many joins between the fact table and the dimensional tables. referential integrity constraints are applied e.g. all join attributes in fact table must have a matching joinattributes in fact table must have a matching join attributes in the dimensional tables. Therefore, CR(R1) =100% full-join method will be used for R1 unless SR(R1) is low. run-time data of the fact table only represents a small portion of the fact table. Therefore, CR(R2) << 100% the semi-join method should be chosen for R2.
  • 17. Cost and Benefit in High Bandwidth Network The join cost of local processing cannot be ignored any more. But, since the cost of joins at a local site using full or semi-join method is at an equivalent level. The semi-join method will be chosen only when: | | ( ) | | ( ) ( ) ( ( ) ( ))R Size R R Size R JC R Size R Size P× − × − × + (6) V< (7) where V’ is the desk data access speed and V is the join switch threshold. | | ( ) | | ( ) ( ) ( ( ) ( )) ( ) ( ) ' A S S R Size R R Size R JC R Size R Size P V JC R Size R V × − × − × + × > | | ( ) | | ( ) ( ) ( ( ) ( )) ' ( ) ( ) A S S R Size R R Size R JC R Size R Size P V JC R Size R × − × − × + × ×
  • 18. Performance Evaluation Join under 250K-2Mbps network 200 250 300 Time(sec) 0 50 100 150 10 30 50 70 90 110 130 150 170 190 210 230 250 Matched Tuples Time(sec) AJoin Hash Ripple Join
  • 19. Performance Evaluation (cont’d) Join under 250K-2Mbps network with 1 sec delay per 1000 tuples 300 350 400 0 50 100 150 200 250 300 10 30 50 70 90 110 130 150 170 190 210 230 250 Matched Tuples Time(sec) AJoin Hash Ripple Join
  • 20. Performance Evaluation (cont’d) Join under 100M-1Gbps network 25 30 35 0 5 10 15 20 25 10 30 50 70 90 110 130 150 170 190 210 230 250 Matched Tuples Time(sec) AJoin Hash Ripple Join
  • 21. Performance Evaluation (cont’d) Join under 100M-1Gbps network with 1 sec delay per 1000 tuples 100 120 0 20 40 60 80 10 30 50 70 90 110 130 150 170 190 210 230 250 Matched tuples time(sec) AJoin Hash Ripple Join
  • 22. Summary The main work undertaken is summarized below: investigation the feasibility and effectiveness of utilising software agent technology to address some specific issues in data warehouses. conduction an experimental study on the performance of modern join algorithm for distributed environment. implement of a framework for an adaptive join algorithm using intelligent agents. evaluation the agent-based join approach against current approaches in distributed and dynamic data warehouse environments.