SlideShare uma empresa Scribd logo
1 de 76
1
Distributed DBMSs
2
Objectives
• Concepts.
• Functions and architecture for a DDBMS.
• Advantages and disadvantages of distributed
databases.
• Distributed database design.
• Levels of transparency.
• Comparison criteria for DDBMSs.
3
Concepts
Distributed Database
A logically interrelated collection of shared data
(and a description of this data), physically
distributed over a computer network.
Distributed DBMS
Software system that permits the management
of the distributed database and makes the
distribution transparent to users.
© Pearson Education Limited 1995, 2005
4
Introduction
A major motivation behind the development of database systems is the desire to
integrate the operational data of an organization and to provide controlled access to
the data.
Although integration and controlled access may imply centralization, this is not the
intention.
In fact, the development of computer networks promotes a decentralized mode of
work.
This decentralized approach mirrors the organizational structure of many companies,
which are logically distributed into divisions, departments, projects, and so on, and
physically distributed into offices, plants, factories, where each unit maintains its own
operational data.
The sharing ability of the data and the efficiency of data access should be improved by
the development of a distributed database system that reflects this organizational
structure, makes the data in all units accessible, and stores data proximate to the
location where it is most frequently used.
5
Concepts
• Collection of logically-related shared data.
• Data split into fragments.
• Fragments may be replicated.
• Fragments/replicas allocated to sites.
• Sites linked by a communications network.
• Data at each site is under control of a DBMS.
• DBMSs handle local applications autonomously.
• Each DBMS participates in at least one global
application.
6
Banking Example
Using distributed database technology, a bank may
implement their database system on a number of separate
computer systems rather than a single, centralized
mainframe.
The computer systems may be located at each local branch
office: for example, Amritsar, Patiala, and Qadian.
A network linking the computer will enable the branches to
communicate with each other, and DDBMS will enable them
to access data stored at another branch office.
Thus, a client living in Amritsar can also check his/her account
during the stay in Patiala or Qadian.
7
Distributed DBMS
The software system that permits the management of the distributed
database and makes the distribution transparent to users.
A Distributed Database Management System (DDBMS) consists of a single logical
database that is split into a number of fragments.
Each fragment is stored on one or more computers under the control of a separate
DBMS, with the computers connected by a communications network.
Each site is capable of independently processing user requests that require access to
local data and is also capable of processing data stored on other computers in the
network.
Users access the distributed database via applications. Applications are classified as
those that do not require data from other sites (local Applications) and those that do
require data from other sites (global applications).
We require a DDBMS to have at least one global application.
8
Distributed Relational Database Design
In this section we examine the factors that have to be considered for
the design of a distributed relational database. More specifically, we
examine:
¨ Fragmentation
A relation may be divided into a number of subrelations, called
fragments, which are the distributed.
There are two main types of fragmentation:
1) Horizontal fragmentation
2) Vertical fragmentation
9
Distributed DBMS
10
Distributed Processing
A centralized database that can be accessed
over a computer network.
11
Parallel DBMS
A DBMS running across multiple processors and
disks designed to execute operations in parallel,
whenever possible, to improve performance.
• Based on premise that single processor systems
can no longer meet requirements for cost-
effective scalability, reliability, and performance.
• Parallel DBMSs link multiple, smaller machines
to achieve same throughput as single, larger
machine, with greater scalability and reliability.
12
Parallel DBMS
• Main architectures for parallel DBMSs are:
– Shared memory,
– Shared disk,
– Shared nothing.
13
Parallel DBMS
(a) shared
memory
(b) shared disk
(c) shared nothing
14
Advantages of DDBMSs
• Reflects organizational structure
• Improved shareability and local autonomy
• Improved availability
• Improved reliability
• Improved performance
• Economics
• Modular growth
15
Disadvantages of DDBMSs
• Complexity
• Cost
• Security
• Integrity control more difficult
• Lack of standards
• Lack of experience
• Database design more complex
16
Types of DDBMS
• Homogeneous DDBMS
• Heterogeneous DDBMS
17
Homogeneous DDBMS
• All sites use same DBMS product.
• Much easier to design and manage.
• Approach provides incremental growth and
allows increased performance.
18
Heterogeneous DDBMS
• Sites may run different DBMS products, with
possibly different underlying data models.
• Occurs when sites have implemented their
own databases and integration is considered
later.
• Translations required to allow for:
– Different hardware.
– Different DBMS products.
– Different hardware and different DBMS products.
• Typical solution is to use gateways.
19
Distributed Relational Database Design
In this section we examine the factors that have to be considered for
the design of a distributed relational database. More specifically, we
examine:
Fragmentation
A relation may be divided into a number of subrelations, called
fragments, which are the distributed.
There are two main types of fragmentation:
1) Horizontal fragmentation
2) Vertical fragmentation
20
Allocation Each fragment is stored at the site with ‘optimal’
distribution.
Replication The DDBMS may maintain a copy of a fragment
at several different sites.
The definition and allocation of fragments must be based on
how the database is to be used. This involves analyzing
transactions.
The design should be based on both quantitative and
qualitative information.
Quantitative information is used in allocation.
Qualitative information is used in fragmentation.
The quantitative information may include:
¨ The frequency with which a transaction is run.
¨ The site from which a transaction is run.
¨ The performance criteria for transactions.
22
Qualitative information
The qualitative information may include information about the
transaction that are following objectives:
•Locality of reference
•Improved reliability and availability
•Acceptable performance
•Balanced storage capacities and costs
• Minimal communication costs
23
Data Allocation
There are four alternative strategies regarding the placement of data:
¨ Centralized
¨ Fragmented
¨ Complete replication
¨ Selective replication.
We now compare these strategies using the strategic objective identified above.
24
Centralized
•This strategy consists of a single database and DBMS stored at one site
with users distributed across the network (we referred to this previously
as distributed processing).
•Locality of reference is at its lowest as all sites, except the central site,
have to use the network for all data accesses.
•This also means that communication costs are high.
•Reliability and availability are low, as a failure of the central site results
in the loss of the entire database system.
Fragmented (or partitioned)
•This strategy partitions the database into disjoint fragments,
with each fragment assigned to one site.
•If data items are located at the site where they are used most
frequently, locality of reference is high.
•As there is no replication, storage cost are low; similarly,
reliability and availability are low, although they are higher
than in the centralized case; as the failure of a site results in
the loss of only that site’s data.
•Performance should be good and communications costs low
if the distribution is designed properly.
26
Advantages of fragmentation
•Usage
•Efficiency
•Parallelism
•Security
Disadvantages of fragmentation
•Performance
•Integrity
27
Data Fragmentation
If relation r is fragmented, r is divided into a number of fragments r1, r2 ……rn. These fragments
contain sufficient information to allow reconstruction of the original relation r. As we shall see,
this reconstruction can take place through the application of either the union operation or a
special type of join operation on the various fragments.
Schemas of Fragmentation
There are three different schemes for fragmenting a relation:
¨ Horizontal fragmentation
¨ Vertical fragmentation
¨ Mixed fragmentation
We shall illustrate these approaches by fragmenting the relation document,
with schema:
EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
29
Horizontal Fragmentation
In horizontal fragmentation, the relations (tables) are divided
horizontally. That is some of the tuples of the relation is placed in one
computer and rest are placed in other computers.
A horizontal fragment is a subset of the total tuples in that relation
To construct the relation R from various horizontal fragments, a UNION
operation can be performed on the fragments.
Such a fragment containing all the tuples of relation R is called a
complete horizontal fragment.
30
Example
suppose that the relation r is the EMP relation of above.
This relation can be divided into n different fragments, each of which
consists of tuples of employee belonging to a particular department.
EMP relation has three departments 10,20 and 30 results three different
fragments:
EMP1=DEPTNO=10 (EMP)
EMP2=DEPTNO=20 (EMP)
EMP3=DEPTNO=30 (EMP)
Fragment r1 is stored in the department number 10 site, fragment r2 is
stored in the department number 20 site and so on r3 is stored at
department number 30 site.
31
We obtain the reconstruction of the relation r by taking the union of all fragments; that is,
R=r1r2…..rn
32
Vertical Fragmentation
In vertical fragmentation, some of the columns (attributes) are stored in
one computer and rest are stored in other computers. T
This is because each site may not need all the attributes of a relation.
A vertical fragment keeps only certain attributes of the relation.
The fragmentation should be done such that we can reconstruct relation
r from the fragments by taking the natural join
r=r1*r2*r3………rn
33
34
Mixed Fragmentation
Mixed fragmentation, also known as Hybrid fragmentation, intermixes
the horizontal and vertical fragmentation.
The relation r is divided into a number of fragment relations r1,
r2……..rn. Each fragment is obtained as the result of application of either
the horizontal fragmentation or vertical fragmentation scheme on
relation r, or on a fragment of r that was obtained previously.
For example, if we can combine the horizontal and vertical
fragmentation of the EMP relation, it will result into a mixed
fragmentation. This relation is divided initially into the fragments EMP1
and EMP2 as vertical fragments. We can now further divide fragment
EMP1 using the horizontal-fragmentation scheme, into the following
two fragments: EMP1a=DEPTNO= 10 (EMP1)
EMP1b=DEPTNO= 20 (EMP1)
35
Distributed Database Design
• Three key issues:
– Fragmentation,
– Allocation,
– Replication.
36
Distributed Database Design
Fragmentation
Relation may be divided into a number of sub-
relations, which are then distributed.
Allocation
Each fragment is stored at site with “optimal”
distribution.
Replication
Copy of fragment may be maintained at several
sites.
37
Data Allocation
• Four alternative strategies regarding
placement of data:
– Centralized,
– Partitioned (or Fragmented),
– Complete Replication,
– Selective Replication.
38
Data Allocation
Centralized: Consists of single database and
DBMS stored at one site with users distributed
across the network.
Partitioned: Database partitioned into disjoint
fragments, each fragment assigned to one site.
Complete Replication: Consists of maintaining
complete copy of database at each site.
Selective Replication: Combination of
partitioning, replication, and centralization.
39
Transparencies in a DDBMS
• Distribution Transparency
– Fragmentation Transparency
– Location Transparency
– Replication Transparency
– Local Mapping Transparency
– Naming Transparency
© Pearson Education Limited 1995, 2005
40
Transparencies in a DDBMS
• Transaction Transparency
– Concurrency Transparency
– Failure Transparency
• Performance Transparency
– DBMS Transparency
• DBMS Transparency
© Pearson Education Limited 1995, 2005
41
Distribution Transparency
• Distribution transparency allows user to
perceive database as single, logical entity.
• If DDBMS exhibits distribution transparency,
user does not need to know:
– data is fragmented (fragmentation transparency),
– location of data items (location transparency),
– otherwise call this local mapping transparency.
• With replication transparency, user is
unaware of replication of fragments .
© Pearson Education Limited 1995, 2005
42
Naming Transparency
• Each item in a DDB must have a unique
name.
• DDBMS must ensure that no two sites create
a database object with same name.
• One solution is to create central name
server. However, this results in:
– loss of some local autonomy;
– central site may become a bottleneck;
– low availability; if the central site fails,
remaining sites cannot create any new objects.
© Pearson Education Limited 1995, 2005
43
Naming Transparency
• Alternative solution - prefix object with identifier of
site that created it.
• For example, Branch created at site S1 might be named
S1.BRANCH.
• Also need to identify each fragment and its copies.
• Thus, copy 2 of fragment 3 of Branch created at site S1
might be referred to as S1.BRANCH.F3.C2.
• However, this results in loss of distribution
transparency.
© Pearson Education Limited 1995, 2005
44
Naming Transparency
• An approach that resolves these problems
uses aliases for each database object.
• Thus, S1.BRANCH.F3.C2 might be known as
LocalBranch by user at site S1.
• DDBMS has task of mapping an alias to
appropriate database object.
© Pearson Education Limited 1995, 2005
45
Transaction Transparency
• Ensures that all distributed transactions maintain
distributed database’s integrity and consistency.
• Distributed transaction accesses data stored at more
than one location.
• Each transaction is divided into number of
subtransactions, one for each site that has to be
accessed.
• DDBMS must ensure the indivisibility of both the
global transaction and each of the subtransactions.
© Pearson Education Limited 1995, 2005
46
Example - Distributed Transaction
• T prints out names of all staff, using schema
defined above as S1, S2, S21, S22, and S23.
Define three subtransactions TS3, TS5, and TS7
to represent agents at sites 3, 5, and 7.
© Pearson Education Limited 1995, 2005
47
Concurrency Transparency
• All transactions must execute
independently and be logically consistent
with results obtained if transactions
executed one at a time, in some arbitrary
serial order.
• Same fundamental principles as for
centralized DBMS.
• DDBMS must ensure both global and local
transactions do not interfere with each
other.
• Similarly, DDBMS must ensure consistency© Pearson Education Limited 1995, 2005
48
Classification of Transactions
• In IBM’s Distributed Relational Database
Architecture (DRDA), four types of
transactions:
– Remote request
– Remote unit of work
– Distributed unit of work
– Distributed request.
© Pearson Education Limited 1995, 2005
49
Classification of Transactions
© Pearson Education Limited 1995, 2005
50
Concurrency Transparency
• Replication makes concurrency more
complex.
• If a copy of a replicated data item is
updated, update must be propagated to all
copies.
• Could propagate changes as part of original
transaction, making it an atomic operation.
• However, if one site holding copy is not
reachable, then transaction is delayed until
site is reachable.© Pearson Education Limited 1995, 2005
51
Concurrency Transparency
• Could limit update propagation to only
those sites currently available. Remaining
sites updated when they become available
again.
• Could allow updates to copies to happen
asynchronously, sometime after the
original update. Delay in regaining
consistency may range from a few seconds
to several hours.
52
Failure Transparency
• DDBMS must ensure atomicity and
durability of global transaction.
• Means ensuring that subtransactions of
global transaction either all commit or all
abort.
• Thus, DDBMS must synchronize global
transaction to ensure that all
subtransactions have completed successfully
before recording a final COMMIT for global
transaction.
• Must do this in presence of site and network
53
Performance Transparency
• DDBMS must perform as if it were a
centralized DBMS.
– DDBMS should not suffer any performance
degradation due to distributed architecture.
– DDBMS should determine most cost-effective
strategy to execute a request.
54
Performance Transparency
• Distributed Query Processor (DQP) maps
data request into ordered sequence of
operations on local databases.
• Must consider fragmentation, replication,
and allocation schemas.
• DQP has to decide:
– which fragment to access;
– which copy of a fragment to use;
– which location to use.
55
Performance Transparency
• DQP produces execution strategy optimized
with respect to some cost function.
• Typically, costs associated with a distributed
request include:
– I/O cost;
– CPU cost;
– communication cost.
56
Performance Transparency - Example
Property(propNo, city) 10000 records in
London
Client(clientNo,maxPrice) 100000 records in
Glasgow
Viewing(propNo, clientNo) 1000000 records in
London
57
Performance Transparency - Example
Assume:
• Each tuple in each relation is 100 characters
long.
• 10 renters with maximum price greater
than £200,000.
• 100 000 viewings for properties in
Aberdeen.
• Computation time negligible compared to
communication time.
58
Performance Transparency - Example
Query Processing in Distributed Databases
• Issues
– Cost of transferring data (files and results) over the network.
• This cost is usually high so some optimization is necessary.
• Example relations: Employee at site 1 and Department at Site 2
– Employee at site 1. 10,000 rows. Row size = 100 bytes. Table size =
106 bytes.
– Department at Site 2. 100 rows. Row size = 35 bytes. Table size =
3,500 bytes.
• Q: For each employee, retrieve employee name and department
name Where the employee works.
• Q: Fname,Lname,Dname (Employee Dno = Dnumber Department)
Fname Minit Lname SSN Bdate Address Sex Salary Superssn Dno
Dname Dnumber Mgrssn Mgrstartdate
Query Processing in Distributed
Databases
• Result
– The result of this query will have 10,000 tuples,
assuming that every employee is related to a
department.
– Suppose each result tuple is 40 bytes long. The
query is submitted at site 3 and the result is sent
to this site.
– Problem: Employee and Department relations are
not present at site 3.
Query Processing in Distributed
Databases
• Strategies:
1. Transfer Employee and Department to site 3.
• Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send the
result to site 3.
• Query result size = 40 * 10,000 = 400,000 bytes. Total transfer
size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join at site 1,
and send the result to site 3.
• Total bytes transferred = 400,000 + 3500 = 403,500 bytes.
• Optimization criteria: minimizing data transfer.
Query Processing in Distributed
Databases
• Strategies:
1. Transfer Employee and Department to site 3.
• Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send the
result to site 3.
• Query result size = 40 * 10,000 = 400,000 bytes. Total transfer
size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join at site 1,
and send the result to site 3.
• Total bytes transferred = 400,000 + 3500 = 403,500 bytes.
• Optimization criteria: minimizing data transfer.
– Preferred approach: strategy 3.
Query Processing in Distributed
Databases
• Consider the query
– Q’: For each department, retrieve the
department name and the name of the
department manager
• Relational Algebra expression:
– Fname,Lname,Dname (Employee Mgrssn = SSN
Department)
Query Processing in Distributed
Databases
• The result of this query will have 100 tuples, assuming that
every department has a manager, the execution strategies
are:
1. Transfer Employee and Department to the result site and
perform the join at site 3.
• Total bytes transferred = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send the
result to site 3. Query result size = 40 * 100 = 4000 bytes.
• Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at site 1
and send the result to site 3.
• Total transfer size = 4000 + 3500 = 7500 bytes.
Query Processing in Distributed
Databases
• The result of this query will have 100 tuples, assuming that
every department has a manager, the execution strategies
are:
1. Transfer Employee and Department to the result site and
perform the join at site 3.
• Total bytes transferred = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send the
result to site 3. Query result size = 40 * 100 = 4000 bytes.
• Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at site 1
and send the result to site 3.
• Total transfer size = 4000 + 3500 = 7500 bytes.
• Preferred strategy: Choose strategy 3.
Query Processing in Distributed
Databases
• Now suppose the result site is 2. Possible
strategies :
1. Transfer Employee relation to site 2, execute the
query and present the result to the user at site 2.
• Total transfer size = 1,000,000 bytes for both queries
Q and Q’.
2. Transfer Department relation to site 1, execute
join at site 1 and send the result back to site 2.
• Total transfer size for Q = 400,000 + 3500 = 403,500
bytes and for Q’ = 4000 + 3500 = 7500 bytes.
Query Processing in Distributed Databases
• Semijoin:
– Objective is to reduce the number of tuples in a relation before
transferring it to another site.
• Example execution of Q or Q’:
1. Project the join attributes of Department at site 2, and transfer
them to site 1. For Q, 4 * 100 = 400 bytes are transferred and
for Q’, 9 * 100 = 900 bytes are transferred.
2. Join the transferred file with the Employee relation at site 1,
and transfer the required attributes from the resulting file to
site 2. For Q, 34 * 10,000 = 340,000 bytes are transferred and
for Q’, 39 * 100 = 3900 bytes are transferred.
3. Execute the query by joining the transferred file with
Department and present the result to the user at site 2.
Concurrency Control and Recovery
• Distributed Databases encounter a number of
concurrency control and recovery problems
which are not present in centralized databases.
• Some of them are listed below:
– Dealing with multiple copies of data items
– Failure of individual sites
– Communication link failure
– Distributed commit
– Distributed deadlock
Concurrency Control and Recovery
• Details
– Dealing with multiple copies of data items:
• The concurrency control must maintain global
consistency. Likewise the recovery mechanism must
recover all copies and maintain consistency after
recovery.
– Failure of individual sites:
• Database availability must not be affected due to the
failure of one or two sites and the recovery scheme
must recover them before they are available for use.
Concurrency Control and Recovery
• Details (contd.)
– Communication link failure:
• This failure may create network partition which would affect
database availability even though all database sites may be
running.
– Distributed commit:
• A transaction may be fragmented and they may be executed by a
number of sites. This require a two or three-phase commit
approach for transaction commit.
– Distributed deadlock:
• Since transactions are processed at multiple sites, two or more
sites may get involved in deadlock. This must be resolved in a
distributed manner.
Concurrency Control and Recovery
• Distributed Concurrency control based on a
distributed copy of a data item
– Primary site technique: A single site is designated
as a primary site which serves as a coordinator for
transaction management.
Communications neteork
Site 5
Site 1
Site 2
Site 4
Site 3
Primary site
Concurrency Control and Recovery
• Transaction management:
– Concurrency control and commit are managed by
this site.
– In two phase locking, this site manages locking
and releasing data items.
– If all transactions follow two-phase policy at all
sites, then serializability is guaranteed.
Concurrency Control and Recovery
• Transaction Management
– Advantages:
• An extension to the centralized two phase locking so
implementation and management is simple.
• Data items are locked only at one site but they can be accessed at
any site.
– Disadvantages:
• All transaction management activities go to primary site which is
likely to overload the site.
• If the primary site fails, the entire system is inaccessible.
– To aid recovery a backup site is designated which behaves as a
shadow of primary site. In case of primary site failure, backup
site can act as primary site.
Concurrency Control and Recovery
• Primary Copy Technique:
– In this approach, instead of a site, a data item partition is
designated as primary copy. To lock a data item just the primary
copy of the data item is locked.
• Advantages:
– Since primary copies are distributed at various sites, a single site
is not overloaded with locking and unlocking requests.
• Disadvantages:
– Identification of a primary copy is complex. A distributed
directory must be maintained, possibly at all sites.
Concurrency Control and Recovery
• Recovery from a coordinator failure
– In both approaches a coordinator site or copy may become
unavailable. This will require the selection of a new coordinator.
• Primary site approach with no backup site:
– Aborts and restarts all active transactions at all sites. Elects a
new coordinator and initiates transaction processing.
• Primary site approach with backup site:
– Suspends all active transactions, designates the backup site as
the primary site and identifies a new back up site. Primary site
receives all transaction management information to resume
processing.
• Primary and backup sites fail or no backup site:
– Use election process to select a new coordinator site.
Concurrency Control and Recovery
• Concurrency control based on voting:
– There is no primary copy of coordinator.
– Send lock request to sites that have data item.
– If majority of sites grant lock then the requesting
transaction gets the data item.
– Locking information (grant or denied) is sent to all
these sites.
– To avoid unacceptably long wait, a time-out period
is defined. If the requesting transaction does not
get any vote information then the transaction is
aborted.

Mais conteúdo relacionado

Mais procurados

Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
Sulemang
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 Reliability
Ali Usman
 

Mais procurados (20)

Lec 7 query processing
Lec 7 query processingLec 7 query processing
Lec 7 query processing
 
Query processing in Distributed Database System
Query processing in Distributed Database SystemQuery processing in Distributed Database System
Query processing in Distributed Database System
 
Distributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data ControlDistributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data Control
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
 
Ddb 1.6-design issues
Ddb 1.6-design issuesDdb 1.6-design issues
Ddb 1.6-design issues
 
Introduction to distributed database
Introduction to distributed databaseIntroduction to distributed database
Introduction to distributed database
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
 
Difference between Homogeneous and Heterogeneous
Difference between Homogeneous  and    HeterogeneousDifference between Homogeneous  and    Heterogeneous
Difference between Homogeneous and Heterogeneous
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
 
Database fragmentation
Database fragmentationDatabase fragmentation
Database fragmentation
 
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
Distributed DBMS - Unit 3 - Distributed DBMS ArchitectureDistributed DBMS - Unit 3 - Distributed DBMS Architecture
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
 
DDBMS Paper with Solution
DDBMS Paper with SolutionDDBMS Paper with Solution
DDBMS Paper with Solution
 
Distribution transparency and Distributed transaction
Distribution transparency and Distributed transactionDistribution transparency and Distributed transaction
Distribution transparency and Distributed transaction
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 Reliability
 
Overview of Concurrency Control & Recovery in Distributed Databases
Overview of Concurrency Control & Recovery in Distributed DatabasesOverview of Concurrency Control & Recovery in Distributed Databases
Overview of Concurrency Control & Recovery in Distributed Databases
 
Lecture 11 - distributed database
Lecture 11 - distributed databaseLecture 11 - distributed database
Lecture 11 - distributed database
 
Distributed database system
Distributed database systemDistributed database system
Distributed database system
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency Control
 

Semelhante a DDBMS

Chapter-6 Distribute Database system (3).ppt
Chapter-6 Distribute Database system (3).pptChapter-6 Distribute Database system (3).ppt
Chapter-6 Distribute Database system (3).ppt
latigudata
 
Csld phan tan va song song
Csld phan tan va song songCsld phan tan va song song
Csld phan tan va song song
Lê Anh Trung
 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 Introduction
Ali Usman
 
Distributed database. pdf
Distributed database. pdfDistributed database. pdf
Distributed database. pdf
SurajGhadge15
 
CP 121_2.pptx about time to be implement
CP 121_2.pptx about time to be implementCP 121_2.pptx about time to be implement
CP 121_2.pptx about time to be implement
flyinimohamed
 

Semelhante a DDBMS (20)

Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management system
 
Advance DBMS
Advance DBMSAdvance DBMS
Advance DBMS
 
Distributed dbms (ddbms)
Distributed dbms (ddbms)Distributed dbms (ddbms)
Distributed dbms (ddbms)
 
Distributed Systems.pptx
Distributed Systems.pptxDistributed Systems.pptx
Distributed Systems.pptx
 
Chapter-6 Distribute Database system (3).ppt
Chapter-6 Distribute Database system (3).pptChapter-6 Distribute Database system (3).ppt
Chapter-6 Distribute Database system (3).ppt
 
Csld phan tan va song song
Csld phan tan va song songCsld phan tan va song song
Csld phan tan va song song
 
2 ddb architecture
2 ddb architecture2 ddb architecture
2 ddb architecture
 
- Introduction - Distributed - System -
- Introduction - Distributed - System  -- Introduction - Distributed - System  -
- Introduction - Distributed - System -
 
9780538469685 ppt ch12 1er exa
9780538469685 ppt ch12 1er exa9780538469685 ppt ch12 1er exa
9780538469685 ppt ch12 1er exa
 
Distributed database management systems
Distributed database management systemsDistributed database management systems
Distributed database management systems
 
Distributed dbms
Distributed dbmsDistributed dbms
Distributed dbms
 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 Introduction
 
Distributed database. pdf
Distributed database. pdfDistributed database. pdf
Distributed database. pdf
 
1 introduction DDBS
1 introduction DDBS1 introduction DDBS
1 introduction DDBS
 
Distributed databases
Distributed  databasesDistributed  databases
Distributed databases
 
CP 121_2.pptx about time to be implement
CP 121_2.pptx about time to be implementCP 121_2.pptx about time to be implement
CP 121_2.pptx about time to be implement
 
unit 1.pdf
unit 1.pdfunit 1.pdf
unit 1.pdf
 
Database management system
Database management systemDatabase management system
Database management system
 
1 introduction ddbms
1 introduction ddbms1 introduction ddbms
1 introduction ddbms
 
ch12text.pdf
ch12text.pdfch12text.pdf
ch12text.pdf
 

Mais de Ravinder Kamboj (13)

Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big Data
 
Cost estimation for Query Optimization
Cost estimation for Query OptimizationCost estimation for Query Optimization
Cost estimation for Query Optimization
 
Query processing
Query processingQuery processing
Query processing
 
Normalization of Data Base
Normalization of Data BaseNormalization of Data Base
Normalization of Data Base
 
Architecture of dbms(lecture 3)
Architecture of dbms(lecture 3)Architecture of dbms(lecture 3)
Architecture of dbms(lecture 3)
 
Sql fundamentals
Sql fundamentalsSql fundamentals
Sql fundamentals
 
Lecture 1&2(rdbms-ii)
Lecture 1&2(rdbms-ii)Lecture 1&2(rdbms-ii)
Lecture 1&2(rdbms-ii)
 
Java script
Java scriptJava script
Java script
 
File Management
File ManagementFile Management
File Management
 
HTML Forms
HTML FormsHTML Forms
HTML Forms
 
DHTML
DHTMLDHTML
DHTML
 
CSA lecture-1
CSA lecture-1CSA lecture-1
CSA lecture-1
 
Relational database management system (rdbms) i
Relational database management system (rdbms) iRelational database management system (rdbms) i
Relational database management system (rdbms) i
 

Último

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Último (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

DDBMS

  • 2. 2 Objectives • Concepts. • Functions and architecture for a DDBMS. • Advantages and disadvantages of distributed databases. • Distributed database design. • Levels of transparency. • Comparison criteria for DDBMSs.
  • 3. 3 Concepts Distributed Database A logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network. Distributed DBMS Software system that permits the management of the distributed database and makes the distribution transparent to users. © Pearson Education Limited 1995, 2005
  • 4. 4 Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an organization and to provide controlled access to the data. Although integration and controlled access may imply centralization, this is not the intention. In fact, the development of computer networks promotes a decentralized mode of work. This decentralized approach mirrors the organizational structure of many companies, which are logically distributed into divisions, departments, projects, and so on, and physically distributed into offices, plants, factories, where each unit maintains its own operational data. The sharing ability of the data and the efficiency of data access should be improved by the development of a distributed database system that reflects this organizational structure, makes the data in all units accessible, and stores data proximate to the location where it is most frequently used.
  • 5. 5 Concepts • Collection of logically-related shared data. • Data split into fragments. • Fragments may be replicated. • Fragments/replicas allocated to sites. • Sites linked by a communications network. • Data at each site is under control of a DBMS. • DBMSs handle local applications autonomously. • Each DBMS participates in at least one global application.
  • 6. 6 Banking Example Using distributed database technology, a bank may implement their database system on a number of separate computer systems rather than a single, centralized mainframe. The computer systems may be located at each local branch office: for example, Amritsar, Patiala, and Qadian. A network linking the computer will enable the branches to communicate with each other, and DDBMS will enable them to access data stored at another branch office. Thus, a client living in Amritsar can also check his/her account during the stay in Patiala or Qadian.
  • 7. 7 Distributed DBMS The software system that permits the management of the distributed database and makes the distribution transparent to users. A Distributed Database Management System (DDBMS) consists of a single logical database that is split into a number of fragments. Each fragment is stored on one or more computers under the control of a separate DBMS, with the computers connected by a communications network. Each site is capable of independently processing user requests that require access to local data and is also capable of processing data stored on other computers in the network. Users access the distributed database via applications. Applications are classified as those that do not require data from other sites (local Applications) and those that do require data from other sites (global applications). We require a DDBMS to have at least one global application.
  • 8. 8 Distributed Relational Database Design In this section we examine the factors that have to be considered for the design of a distributed relational database. More specifically, we examine: ¨ Fragmentation A relation may be divided into a number of subrelations, called fragments, which are the distributed. There are two main types of fragmentation: 1) Horizontal fragmentation 2) Vertical fragmentation
  • 10. 10 Distributed Processing A centralized database that can be accessed over a computer network.
  • 11. 11 Parallel DBMS A DBMS running across multiple processors and disks designed to execute operations in parallel, whenever possible, to improve performance. • Based on premise that single processor systems can no longer meet requirements for cost- effective scalability, reliability, and performance. • Parallel DBMSs link multiple, smaller machines to achieve same throughput as single, larger machine, with greater scalability and reliability.
  • 12. 12 Parallel DBMS • Main architectures for parallel DBMSs are: – Shared memory, – Shared disk, – Shared nothing.
  • 13. 13 Parallel DBMS (a) shared memory (b) shared disk (c) shared nothing
  • 14. 14 Advantages of DDBMSs • Reflects organizational structure • Improved shareability and local autonomy • Improved availability • Improved reliability • Improved performance • Economics • Modular growth
  • 15. 15 Disadvantages of DDBMSs • Complexity • Cost • Security • Integrity control more difficult • Lack of standards • Lack of experience • Database design more complex
  • 16. 16 Types of DDBMS • Homogeneous DDBMS • Heterogeneous DDBMS
  • 17. 17 Homogeneous DDBMS • All sites use same DBMS product. • Much easier to design and manage. • Approach provides incremental growth and allows increased performance.
  • 18. 18 Heterogeneous DDBMS • Sites may run different DBMS products, with possibly different underlying data models. • Occurs when sites have implemented their own databases and integration is considered later. • Translations required to allow for: – Different hardware. – Different DBMS products. – Different hardware and different DBMS products. • Typical solution is to use gateways.
  • 19. 19 Distributed Relational Database Design In this section we examine the factors that have to be considered for the design of a distributed relational database. More specifically, we examine: Fragmentation A relation may be divided into a number of subrelations, called fragments, which are the distributed. There are two main types of fragmentation: 1) Horizontal fragmentation 2) Vertical fragmentation
  • 20. 20 Allocation Each fragment is stored at the site with ‘optimal’ distribution. Replication The DDBMS may maintain a copy of a fragment at several different sites. The definition and allocation of fragments must be based on how the database is to be used. This involves analyzing transactions. The design should be based on both quantitative and qualitative information.
  • 21. Quantitative information is used in allocation. Qualitative information is used in fragmentation. The quantitative information may include: ¨ The frequency with which a transaction is run. ¨ The site from which a transaction is run. ¨ The performance criteria for transactions.
  • 22. 22 Qualitative information The qualitative information may include information about the transaction that are following objectives: •Locality of reference •Improved reliability and availability •Acceptable performance •Balanced storage capacities and costs • Minimal communication costs
  • 23. 23 Data Allocation There are four alternative strategies regarding the placement of data: ¨ Centralized ¨ Fragmented ¨ Complete replication ¨ Selective replication. We now compare these strategies using the strategic objective identified above.
  • 24. 24 Centralized •This strategy consists of a single database and DBMS stored at one site with users distributed across the network (we referred to this previously as distributed processing). •Locality of reference is at its lowest as all sites, except the central site, have to use the network for all data accesses. •This also means that communication costs are high. •Reliability and availability are low, as a failure of the central site results in the loss of the entire database system.
  • 25. Fragmented (or partitioned) •This strategy partitions the database into disjoint fragments, with each fragment assigned to one site. •If data items are located at the site where they are used most frequently, locality of reference is high. •As there is no replication, storage cost are low; similarly, reliability and availability are low, although they are higher than in the centralized case; as the failure of a site results in the loss of only that site’s data. •Performance should be good and communications costs low if the distribution is designed properly.
  • 27. 27 Data Fragmentation If relation r is fragmented, r is divided into a number of fragments r1, r2 ……rn. These fragments contain sufficient information to allow reconstruction of the original relation r. As we shall see, this reconstruction can take place through the application of either the union operation or a special type of join operation on the various fragments.
  • 28. Schemas of Fragmentation There are three different schemes for fragmenting a relation: ¨ Horizontal fragmentation ¨ Vertical fragmentation ¨ Mixed fragmentation We shall illustrate these approaches by fragmenting the relation document, with schema: EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
  • 29. 29 Horizontal Fragmentation In horizontal fragmentation, the relations (tables) are divided horizontally. That is some of the tuples of the relation is placed in one computer and rest are placed in other computers. A horizontal fragment is a subset of the total tuples in that relation To construct the relation R from various horizontal fragments, a UNION operation can be performed on the fragments. Such a fragment containing all the tuples of relation R is called a complete horizontal fragment.
  • 30. 30 Example suppose that the relation r is the EMP relation of above. This relation can be divided into n different fragments, each of which consists of tuples of employee belonging to a particular department. EMP relation has three departments 10,20 and 30 results three different fragments: EMP1=DEPTNO=10 (EMP) EMP2=DEPTNO=20 (EMP) EMP3=DEPTNO=30 (EMP) Fragment r1 is stored in the department number 10 site, fragment r2 is stored in the department number 20 site and so on r3 is stored at department number 30 site.
  • 31. 31 We obtain the reconstruction of the relation r by taking the union of all fragments; that is, R=r1r2…..rn
  • 32. 32 Vertical Fragmentation In vertical fragmentation, some of the columns (attributes) are stored in one computer and rest are stored in other computers. T This is because each site may not need all the attributes of a relation. A vertical fragment keeps only certain attributes of the relation. The fragmentation should be done such that we can reconstruct relation r from the fragments by taking the natural join r=r1*r2*r3………rn
  • 33. 33
  • 34. 34 Mixed Fragmentation Mixed fragmentation, also known as Hybrid fragmentation, intermixes the horizontal and vertical fragmentation. The relation r is divided into a number of fragment relations r1, r2……..rn. Each fragment is obtained as the result of application of either the horizontal fragmentation or vertical fragmentation scheme on relation r, or on a fragment of r that was obtained previously. For example, if we can combine the horizontal and vertical fragmentation of the EMP relation, it will result into a mixed fragmentation. This relation is divided initially into the fragments EMP1 and EMP2 as vertical fragments. We can now further divide fragment EMP1 using the horizontal-fragmentation scheme, into the following two fragments: EMP1a=DEPTNO= 10 (EMP1) EMP1b=DEPTNO= 20 (EMP1)
  • 35. 35 Distributed Database Design • Three key issues: – Fragmentation, – Allocation, – Replication.
  • 36. 36 Distributed Database Design Fragmentation Relation may be divided into a number of sub- relations, which are then distributed. Allocation Each fragment is stored at site with “optimal” distribution. Replication Copy of fragment may be maintained at several sites.
  • 37. 37 Data Allocation • Four alternative strategies regarding placement of data: – Centralized, – Partitioned (or Fragmented), – Complete Replication, – Selective Replication.
  • 38. 38 Data Allocation Centralized: Consists of single database and DBMS stored at one site with users distributed across the network. Partitioned: Database partitioned into disjoint fragments, each fragment assigned to one site. Complete Replication: Consists of maintaining complete copy of database at each site. Selective Replication: Combination of partitioning, replication, and centralization.
  • 39. 39 Transparencies in a DDBMS • Distribution Transparency – Fragmentation Transparency – Location Transparency – Replication Transparency – Local Mapping Transparency – Naming Transparency © Pearson Education Limited 1995, 2005
  • 40. 40 Transparencies in a DDBMS • Transaction Transparency – Concurrency Transparency – Failure Transparency • Performance Transparency – DBMS Transparency • DBMS Transparency © Pearson Education Limited 1995, 2005
  • 41. 41 Distribution Transparency • Distribution transparency allows user to perceive database as single, logical entity. • If DDBMS exhibits distribution transparency, user does not need to know: – data is fragmented (fragmentation transparency), – location of data items (location transparency), – otherwise call this local mapping transparency. • With replication transparency, user is unaware of replication of fragments . © Pearson Education Limited 1995, 2005
  • 42. 42 Naming Transparency • Each item in a DDB must have a unique name. • DDBMS must ensure that no two sites create a database object with same name. • One solution is to create central name server. However, this results in: – loss of some local autonomy; – central site may become a bottleneck; – low availability; if the central site fails, remaining sites cannot create any new objects. © Pearson Education Limited 1995, 2005
  • 43. 43 Naming Transparency • Alternative solution - prefix object with identifier of site that created it. • For example, Branch created at site S1 might be named S1.BRANCH. • Also need to identify each fragment and its copies. • Thus, copy 2 of fragment 3 of Branch created at site S1 might be referred to as S1.BRANCH.F3.C2. • However, this results in loss of distribution transparency. © Pearson Education Limited 1995, 2005
  • 44. 44 Naming Transparency • An approach that resolves these problems uses aliases for each database object. • Thus, S1.BRANCH.F3.C2 might be known as LocalBranch by user at site S1. • DDBMS has task of mapping an alias to appropriate database object. © Pearson Education Limited 1995, 2005
  • 45. 45 Transaction Transparency • Ensures that all distributed transactions maintain distributed database’s integrity and consistency. • Distributed transaction accesses data stored at more than one location. • Each transaction is divided into number of subtransactions, one for each site that has to be accessed. • DDBMS must ensure the indivisibility of both the global transaction and each of the subtransactions. © Pearson Education Limited 1995, 2005
  • 46. 46 Example - Distributed Transaction • T prints out names of all staff, using schema defined above as S1, S2, S21, S22, and S23. Define three subtransactions TS3, TS5, and TS7 to represent agents at sites 3, 5, and 7. © Pearson Education Limited 1995, 2005
  • 47. 47 Concurrency Transparency • All transactions must execute independently and be logically consistent with results obtained if transactions executed one at a time, in some arbitrary serial order. • Same fundamental principles as for centralized DBMS. • DDBMS must ensure both global and local transactions do not interfere with each other. • Similarly, DDBMS must ensure consistency© Pearson Education Limited 1995, 2005
  • 48. 48 Classification of Transactions • In IBM’s Distributed Relational Database Architecture (DRDA), four types of transactions: – Remote request – Remote unit of work – Distributed unit of work – Distributed request. © Pearson Education Limited 1995, 2005
  • 49. 49 Classification of Transactions © Pearson Education Limited 1995, 2005
  • 50. 50 Concurrency Transparency • Replication makes concurrency more complex. • If a copy of a replicated data item is updated, update must be propagated to all copies. • Could propagate changes as part of original transaction, making it an atomic operation. • However, if one site holding copy is not reachable, then transaction is delayed until site is reachable.© Pearson Education Limited 1995, 2005
  • 51. 51 Concurrency Transparency • Could limit update propagation to only those sites currently available. Remaining sites updated when they become available again. • Could allow updates to copies to happen asynchronously, sometime after the original update. Delay in regaining consistency may range from a few seconds to several hours.
  • 52. 52 Failure Transparency • DDBMS must ensure atomicity and durability of global transaction. • Means ensuring that subtransactions of global transaction either all commit or all abort. • Thus, DDBMS must synchronize global transaction to ensure that all subtransactions have completed successfully before recording a final COMMIT for global transaction. • Must do this in presence of site and network
  • 53. 53 Performance Transparency • DDBMS must perform as if it were a centralized DBMS. – DDBMS should not suffer any performance degradation due to distributed architecture. – DDBMS should determine most cost-effective strategy to execute a request.
  • 54. 54 Performance Transparency • Distributed Query Processor (DQP) maps data request into ordered sequence of operations on local databases. • Must consider fragmentation, replication, and allocation schemas. • DQP has to decide: – which fragment to access; – which copy of a fragment to use; – which location to use.
  • 55. 55 Performance Transparency • DQP produces execution strategy optimized with respect to some cost function. • Typically, costs associated with a distributed request include: – I/O cost; – CPU cost; – communication cost.
  • 56. 56 Performance Transparency - Example Property(propNo, city) 10000 records in London Client(clientNo,maxPrice) 100000 records in Glasgow Viewing(propNo, clientNo) 1000000 records in London
  • 57. 57 Performance Transparency - Example Assume: • Each tuple in each relation is 100 characters long. • 10 renters with maximum price greater than £200,000. • 100 000 viewings for properties in Aberdeen. • Computation time negligible compared to communication time.
  • 59. Query Processing in Distributed Databases • Issues – Cost of transferring data (files and results) over the network. • This cost is usually high so some optimization is necessary. • Example relations: Employee at site 1 and Department at Site 2 – Employee at site 1. 10,000 rows. Row size = 100 bytes. Table size = 106 bytes. – Department at Site 2. 100 rows. Row size = 35 bytes. Table size = 3,500 bytes. • Q: For each employee, retrieve employee name and department name Where the employee works. • Q: Fname,Lname,Dname (Employee Dno = Dnumber Department) Fname Minit Lname SSN Bdate Address Sex Salary Superssn Dno Dname Dnumber Mgrssn Mgrstartdate
  • 60. Query Processing in Distributed Databases • Result – The result of this query will have 10,000 tuples, assuming that every employee is related to a department. – Suppose each result tuple is 40 bytes long. The query is submitted at site 3 and the result is sent to this site. – Problem: Employee and Department relations are not present at site 3.
  • 61. Query Processing in Distributed Databases • Strategies: 1. Transfer Employee and Department to site 3. • Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3. • Query result size = 40 * 10,000 = 400,000 bytes. Total transfer size = 400,000 + 1,000,000 = 1,400,000 bytes. 3. Transfer Department relation to site 1, execute the join at site 1, and send the result to site 3. • Total bytes transferred = 400,000 + 3500 = 403,500 bytes. • Optimization criteria: minimizing data transfer.
  • 62. Query Processing in Distributed Databases • Strategies: 1. Transfer Employee and Department to site 3. • Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3. • Query result size = 40 * 10,000 = 400,000 bytes. Total transfer size = 400,000 + 1,000,000 = 1,400,000 bytes. 3. Transfer Department relation to site 1, execute the join at site 1, and send the result to site 3. • Total bytes transferred = 400,000 + 3500 = 403,500 bytes. • Optimization criteria: minimizing data transfer. – Preferred approach: strategy 3.
  • 63. Query Processing in Distributed Databases • Consider the query – Q’: For each department, retrieve the department name and the name of the department manager • Relational Algebra expression: – Fname,Lname,Dname (Employee Mgrssn = SSN Department)
  • 64. Query Processing in Distributed Databases • The result of this query will have 100 tuples, assuming that every department has a manager, the execution strategies are: 1. Transfer Employee and Department to the result site and perform the join at site 3. • Total bytes transferred = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3. Query result size = 40 * 100 = 4000 bytes. • Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes. 3. Transfer Department relation to site 1, execute join at site 1 and send the result to site 3. • Total transfer size = 4000 + 3500 = 7500 bytes.
  • 65. Query Processing in Distributed Databases • The result of this query will have 100 tuples, assuming that every department has a manager, the execution strategies are: 1. Transfer Employee and Department to the result site and perform the join at site 3. • Total bytes transferred = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3. Query result size = 40 * 100 = 4000 bytes. • Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes. 3. Transfer Department relation to site 1, execute join at site 1 and send the result to site 3. • Total transfer size = 4000 + 3500 = 7500 bytes. • Preferred strategy: Choose strategy 3.
  • 66. Query Processing in Distributed Databases • Now suppose the result site is 2. Possible strategies : 1. Transfer Employee relation to site 2, execute the query and present the result to the user at site 2. • Total transfer size = 1,000,000 bytes for both queries Q and Q’. 2. Transfer Department relation to site 1, execute join at site 1 and send the result back to site 2. • Total transfer size for Q = 400,000 + 3500 = 403,500 bytes and for Q’ = 4000 + 3500 = 7500 bytes.
  • 67. Query Processing in Distributed Databases • Semijoin: – Objective is to reduce the number of tuples in a relation before transferring it to another site. • Example execution of Q or Q’: 1. Project the join attributes of Department at site 2, and transfer them to site 1. For Q, 4 * 100 = 400 bytes are transferred and for Q’, 9 * 100 = 900 bytes are transferred. 2. Join the transferred file with the Employee relation at site 1, and transfer the required attributes from the resulting file to site 2. For Q, 34 * 10,000 = 340,000 bytes are transferred and for Q’, 39 * 100 = 3900 bytes are transferred. 3. Execute the query by joining the transferred file with Department and present the result to the user at site 2.
  • 68. Concurrency Control and Recovery • Distributed Databases encounter a number of concurrency control and recovery problems which are not present in centralized databases. • Some of them are listed below: – Dealing with multiple copies of data items – Failure of individual sites – Communication link failure – Distributed commit – Distributed deadlock
  • 69. Concurrency Control and Recovery • Details – Dealing with multiple copies of data items: • The concurrency control must maintain global consistency. Likewise the recovery mechanism must recover all copies and maintain consistency after recovery. – Failure of individual sites: • Database availability must not be affected due to the failure of one or two sites and the recovery scheme must recover them before they are available for use.
  • 70. Concurrency Control and Recovery • Details (contd.) – Communication link failure: • This failure may create network partition which would affect database availability even though all database sites may be running. – Distributed commit: • A transaction may be fragmented and they may be executed by a number of sites. This require a two or three-phase commit approach for transaction commit. – Distributed deadlock: • Since transactions are processed at multiple sites, two or more sites may get involved in deadlock. This must be resolved in a distributed manner.
  • 71. Concurrency Control and Recovery • Distributed Concurrency control based on a distributed copy of a data item – Primary site technique: A single site is designated as a primary site which serves as a coordinator for transaction management. Communications neteork Site 5 Site 1 Site 2 Site 4 Site 3 Primary site
  • 72. Concurrency Control and Recovery • Transaction management: – Concurrency control and commit are managed by this site. – In two phase locking, this site manages locking and releasing data items. – If all transactions follow two-phase policy at all sites, then serializability is guaranteed.
  • 73. Concurrency Control and Recovery • Transaction Management – Advantages: • An extension to the centralized two phase locking so implementation and management is simple. • Data items are locked only at one site but they can be accessed at any site. – Disadvantages: • All transaction management activities go to primary site which is likely to overload the site. • If the primary site fails, the entire system is inaccessible. – To aid recovery a backup site is designated which behaves as a shadow of primary site. In case of primary site failure, backup site can act as primary site.
  • 74. Concurrency Control and Recovery • Primary Copy Technique: – In this approach, instead of a site, a data item partition is designated as primary copy. To lock a data item just the primary copy of the data item is locked. • Advantages: – Since primary copies are distributed at various sites, a single site is not overloaded with locking and unlocking requests. • Disadvantages: – Identification of a primary copy is complex. A distributed directory must be maintained, possibly at all sites.
  • 75. Concurrency Control and Recovery • Recovery from a coordinator failure – In both approaches a coordinator site or copy may become unavailable. This will require the selection of a new coordinator. • Primary site approach with no backup site: – Aborts and restarts all active transactions at all sites. Elects a new coordinator and initiates transaction processing. • Primary site approach with backup site: – Suspends all active transactions, designates the backup site as the primary site and identifies a new back up site. Primary site receives all transaction management information to resume processing. • Primary and backup sites fail or no backup site: – Use election process to select a new coordinator site.
  • 76. Concurrency Control and Recovery • Concurrency control based on voting: – There is no primary copy of coordinator. – Send lock request to sites that have data item. – If majority of sites grant lock then the requesting transaction gets the data item. – Locking information (grant or denied) is sent to all these sites. – To avoid unacceptably long wait, a time-out period is defined. If the requesting transaction does not get any vote information then the transaction is aborted.