TeamStation AI System Report LATAM IT Salaries 2024
Distributed Database Management Systems (Distributed DBMS)
1. Rushdi Shams, Dept of CSE, KUET 1
Database SystemsDatabase Systems
Distributed Database SystemsDistributed Database Systems
Version 1.0Version 1.0
2. 2Rushdi Shams, Dept of CSE, KUET
IntroductionIntroduction
A distributed database systems is a databaseA distributed database systems is a database
systems which issystems which is fragmented or replicatedfragmented or replicated onon
machinesmachines
These machines are usually located on differentThese machines are usually located on different
geographical location of an organizationgeographical location of an organization
FragmentationFragmentation is made of some subsets of theis made of some subsets of the
original databaseoriginal database
ReplicationReplication refers to the copy of the wholerefers to the copy of the whole
database or part of the original databasedatabase or part of the original database
3. 3Rushdi Shams, Dept of CSE, KUET
Idea of Distributed DatabaseIdea of Distributed Database
SystemsSystems
4 sites connected by a communication network4 sites connected by a communication network
Sites 1, 2 and 4 run a single databaseSites 1, 2 and 4 run a single database
Site 3 has no database. It accesses the other 3 sites for dataSite 3 has no database. It accesses the other 3 sites for data
manipulationmanipulation
4. 4Rushdi Shams, Dept of CSE, KUET
FragmentationFragmentation
There are 2 basic types of fragmentationsThere are 2 basic types of fragmentations
1.1. Horizontal fragmentationHorizontal fragmentation
2.2. Vertical fragmentationVertical fragmentation
5. 5Rushdi Shams, Dept of CSE, KUET
Horizontal FragmentationHorizontal Fragmentation
Horizontal fragmentation is the subset of rows of a single tableHorizontal fragmentation is the subset of rows of a single table
Say, we need to manipulate a table that contains informationSay, we need to manipulate a table that contains information
about British Peopleabout British People
We have 3 sitesWe have 3 sites
Edinburgh site will have those rows of the table that haveEdinburgh site will have those rows of the table that have
information about Scottish peopleinformation about Scottish people
Cardiff site will have those rows of the table that haveCardiff site will have those rows of the table that have
information about Welsh peopleinformation about Welsh people
London site will have those rows of the table that haveLondon site will have those rows of the table that have
information about English peopleinformation about English people
The 3 sites are working as distributed processors. So, togetherThe 3 sites are working as distributed processors. So, together
they represent information about all the British peoplethey represent information about all the British people
7. 7Rushdi Shams, Dept of CSE, KUET
Horizontal FragmentationHorizontal Fragmentation
(continued)(continued)
Horizontal fragmentation is done by restricting theHorizontal fragmentation is done by restricting the
table with WHERE condition in query languages!!table with WHERE condition in query languages!!
In the previous example, you can fragment the tableIn the previous example, you can fragment the table
likelike
1.1. WHERE LOCATION=EDINBURGHWHERE LOCATION=EDINBURGH
2.2. WHERE LOCATION=CARDIFFWHERE LOCATION=CARDIFF
3.3. WHERE LOCATION=LONDONWHERE LOCATION=LONDON
To find the original table, you just union all theTo find the original table, you just union all the
fragmented tables!fragmented tables!
Easy, huh?Easy, huh?
8. 8Rushdi Shams, Dept of CSE, KUET
Horizontal FragmentationHorizontal Fragmentation
(continued)(continued)
Consider the horizontal fragmentation of relation Proj accordingConsider the horizontal fragmentation of relation Proj according
to its BUDGET value.to its BUDGET value.
Relations with BUDGET > 200000 go into Proj1 and the restRelations with BUDGET > 200000 go into Proj1 and the rest
goes into Proj2.goes into Proj2.
Proj1=Proj1= σσ(budget>200000)(budget>200000) ProjProj
Proj2=Proj2= σσ(budget(budget ≤≤ 200000)200000) ProjProj
9. 9Rushdi Shams, Dept of CSE, KUET
Vertical FragmentationVertical Fragmentation
Vertical fragmentation is a method ofVertical fragmentation is a method of
fragmenting a table byfragmenting a table by projectingprojecting columns of acolumns of a
tabletable with primary keywith primary key
To find out the original table, you just need toTo find out the original table, you just need to
join the newly created tables according to thejoin the newly created tables according to the
primary key!primary key!
Again, it’s easy, huh?Again, it’s easy, huh?
10. 10Rushdi Shams, Dept of CSE, KUET
Vertical Fragmentation (continued)Vertical Fragmentation (continued)
The table proj is fragmented into 2 tables proj 1 and proj 2The table proj is fragmented into 2 tables proj 1 and proj 2
Both tables have the primary key- PNO. Keep an eye on it,Both tables have the primary key- PNO. Keep an eye on it,
fellows!fellows!
If you join them according to the PNO of both table, what doIf you join them according to the PNO of both table, what do
you get? Answer- Proj table again!!you get? Answer- Proj table again!!
11. 11Rushdi Shams, Dept of CSE, KUET
Both Fragmentation at A GlanceBoth Fragmentation at A Glance
12. 12Rushdi Shams, Dept of CSE, KUET
Why FragmentationWhy Fragmentation
Usage:Usage:
Applications work with views rather than entireApplications work with views rather than entire
relationsrelations
Efficiency:Efficiency:
Data is stored close to where it is mostData is stored close to where it is most
frequently usedfrequently used
Data that is not needed by local applications areData that is not needed by local applications are
not storednot stored
13. 13Rushdi Shams, Dept of CSE, KUET
Why Fragmentation (continued)Why Fragmentation (continued)
Parallelism:Parallelism:
Transaction can be divided into severalTransaction can be divided into several
subqueries that operate on fragmentssubqueries that operate on fragments
Security:Security:
Data that is not needed by local applications areData that is not needed by local applications are
not stored and so is not vulnerable tonot stored and so is not vulnerable to
unauthorized usersunauthorized users
14. 14Rushdi Shams, Dept of CSE, KUET
Disadvantage of FragmentationDisadvantage of Fragmentation
Performance:Performance:
If queries involve to fetch data from tables thatIf queries involve to fetch data from tables that
are on different sites, it requires processing timeare on different sites, it requires processing time
15. 15Rushdi Shams, Dept of CSE, KUET
Correctness of FragmentationCorrectness of Fragmentation
Well, when I first hear correctness- I wasWell, when I first hear correctness- I was
boomed! Actually it means nothing rather thanboomed! Actually it means nothing rather than
some properties of fragmentationsome properties of fragmentation
So, don’t worry about that. It is calledSo, don’t worry about that. It is called
CORRECTNESS in database jargon, so, don’tCORRECTNESS in database jargon, so, don’t
call it property, a’right?call it property, a’right?
16. 16Rushdi Shams, Dept of CSE, KUET
Correctness of FragmentationCorrectness of Fragmentation
(continued)(continued)
There are 3 correctness rulesThere are 3 correctness rules
1.1. CompletenessCompleteness
2.2. ReconstructionReconstruction
3.3. DisjointnessDisjointness
17. 17Rushdi Shams, Dept of CSE, KUET
Correctness of FragmentationCorrectness of Fragmentation
(continued)(continued)
1.1. Completeness:Completeness:
If relation R is fragmented into fragments R1,If relation R is fragmented into fragments R1,
R2, R3… Rn, each data item that can be foundR2, R3… Rn, each data item that can be found
in R must appear in at least one fragmentin R must appear in at least one fragment
So, why don’t you say this way- no data itemSo, why don’t you say this way- no data item
of original relation R gets missing!!of original relation R gets missing!!
Man, I hate theoretical definitions!Man, I hate theoretical definitions!
18. 18Rushdi Shams, Dept of CSE, KUET
Correctness of FragmentationCorrectness of Fragmentation
(continued)(continued)
2.2. Reconstruction:Reconstruction:
There must be a relational operation by whichThere must be a relational operation by which
we can reconstruct R from the fragmentswe can reconstruct R from the fragments
We already saw that by Unioning (We already saw that by Unioning ())
horizontal fragments we can have original Rhorizontal fragments we can have original R
and by joining vertical fragments, we canand by joining vertical fragments, we can
achieve R!achieve R!
19. 19Rushdi Shams, Dept of CSE, KUET
Correctness of FragmentationCorrectness of Fragmentation
(continued)(continued)
3.3. Disjointness:Disjointness:
If data item Di appears in fragment Ri, then itIf data item Di appears in fragment Ri, then it
should not appear in any other fragmentshould not appear in any other fragment
Exception of this is in vertical fragmentation,Exception of this is in vertical fragmentation,
where primary key attributes must be repeatedwhere primary key attributes must be repeated
to allow reconstructionto allow reconstruction
20. 20Rushdi Shams, Dept of CSE, KUET
TransparencyTransparency
You have distributed one table to 3 sites justYou have distributed one table to 3 sites just
now. The user, when he requires data, shouldnow. The user, when he requires data, should
not know this!not know this!
This process of hiding the fragmentation andThis process of hiding the fragmentation and
distribution the fragments to different sites isdistribution the fragments to different sites is
called transparencycalled transparency
21. 21Rushdi Shams, Dept of CSE, KUET
Types of TransparencyTypes of Transparency
1.1. Location transparencyLocation transparency
User should not be aware of the location of the data.User should not be aware of the location of the data.
This simplifies the user interface and user programsThis simplifies the user interface and user programs
that are used to query the tablethat are used to query the table
2.2. Fragmentation transparencyFragmentation transparency
User must not know that the data have beenUser must not know that the data have been
fragmented and how the data have been fragmentedfragmented and how the data have been fragmented
3.3. Replication transparencyReplication transparency
Replication is necessary sometimes as this makes theReplication is necessary sometimes as this makes the
processing faster. But user should not be aware of it.processing faster. But user should not be aware of it.
22. 22Rushdi Shams, Dept of CSE, KUET
Need of TransparencyNeed of Transparency
A manager wishing to find the total number of
employees at the Scottish subsidiary need not be
aware that he is querying a remote database
A manager running a query in London should not need
to be aware that to produce the aggregate salary bill
for the company all three sites – London, Cardiff and
Edinburgh – need to be interrogated
When periodically data need to be updated, the user
need not directly know that three sites are effectively
updated
23. 23Rushdi Shams, Dept of CSE, KUET
Foundation RuleFoundation Rule
The foundation rule of distributed databaseThe foundation rule of distributed database
systems states-systems states-
““Although the database systems are distributedAlthough the database systems are distributed
to several sites, it must look like a centralisedto several sites, it must look like a centralised
database systems to the user”database systems to the user”
Then how do you make this foundation ruleThen how do you make this foundation rule
true?true?
Answer- by applying 3 types of transparenciesAnswer- by applying 3 types of transparencies
24. 24Rushdi Shams, Dept of CSE, KUET
Advantages of Distributed DatabaseAdvantages of Distributed Database
SystemsSystems
Reflects organizational structureReflects organizational structure — database— database
fragments are located in the departments theyfragments are located in the departments they
relate to.relate to.
Local autonomyLocal autonomy — a department can control— a department can control
the data about them (as they are the onesthe data about them (as they are the ones
familiar with it.)familiar with it.)
Improved availabilityImproved availability — a fault in one— a fault in one
database system will only affect one fragment,database system will only affect one fragment,
instead of the entire databaseinstead of the entire database
25. 25Rushdi Shams, Dept of CSE, KUET
Advantages of Distributed DatabaseAdvantages of Distributed Database
Systems (continued)Systems (continued)
Improved performanceImproved performance — data is located near the site— data is located near the site
of greatest demand, and the database systemsof greatest demand, and the database systems
themselves are parallelized, allowing load on thethemselves are parallelized, allowing load on the
databases to be balanced among servers. (A high loaddatabases to be balanced among servers. (A high load
on one module of the database won't affect otheron one module of the database won't affect other
modules of the database in a distributed database.)modules of the database in a distributed database.)
EconomicsEconomics — it costs less to create a network of— it costs less to create a network of
smaller computers with the power of a single largesmaller computers with the power of a single large
computer.computer.
ModularityModularity — systems can be modified, added and— systems can be modified, added and
removed from the distributed database withoutremoved from the distributed database without
affecting other modules (systems).affecting other modules (systems).
26. 26Rushdi Shams, Dept of CSE, KUET
Disadvantages of DistributedDisadvantages of Distributed
Database SystemsDatabase Systems
ComplexityComplexity — extra work must be done by the DBAs— extra work must be done by the DBAs
to ensure that the distributed nature of the system isto ensure that the distributed nature of the system is
transparent. Extra work must also be done to maintaintransparent. Extra work must also be done to maintain
multiple disparate systems, instead of one big one.multiple disparate systems, instead of one big one.
Extra database design work must also be done toExtra database design work must also be done to
account for the disconnected nature of the database —account for the disconnected nature of the database —
for example, joins become prohibitively expensivefor example, joins become prohibitively expensive
when performed across multiple systems.when performed across multiple systems.
EconomicsEconomics — increased complexity and a more— increased complexity and a more
extensive infrastructure means extra labour costs.extensive infrastructure means extra labour costs.
27. 27Rushdi Shams, Dept of CSE, KUET
Disadvantages of DistributedDisadvantages of Distributed
Database Systems (continued)Database Systems (continued)
SecuritySecurity — remote database fragments must be— remote database fragments must be
secured, and they are not centralized so the remote sitessecured, and they are not centralized so the remote sites
must be secured as well. The infrastructure must also bemust be secured as well. The infrastructure must also be
secured (eg: by encrypting the network links betweensecured (eg: by encrypting the network links between
remote sites).remote sites).
Difficult to maintain integrityDifficult to maintain integrity — in a distributed— in a distributed
database enforcing integrity over a network may requiredatabase enforcing integrity over a network may require
too much networking resources to be feasible.too much networking resources to be feasible.
InexperienceInexperience — distributed databases are difficult to— distributed databases are difficult to
work with, and as a young field there is not muchwork with, and as a young field there is not much
readily available experience on proper practice.readily available experience on proper practice.
28. 28Rushdi Shams, Dept of CSE, KUET
Types of Distributed DatabaseTypes of Distributed Database
SystemsSystems
1.1. Homogeneous Database SystemsHomogeneous Database Systems
2.2. Heterogeneous Database SystemsHeterogeneous Database Systems
3.3. Federated Database SystemsFederated Database Systems
29. 29Rushdi Shams, Dept of CSE, KUET
Homogeneous Distributed DatabaseHomogeneous Distributed Database
SystemsSystems
Data is distributed across 2 or more systemsData is distributed across 2 or more systems
All the systems will have to run the same DBMSAll the systems will have to run the same DBMS
(eg. Oracle)(eg. Oracle)
Moreover, the systems should be run on theMoreover, the systems should be run on the
same hardware platformsame hardware platform
And the systems should be run on the sameAnd the systems should be run on the same
Operating SystemsOperating Systems
Hmm, pretty weird??Hmm, pretty weird??
30. 30Rushdi Shams, Dept of CSE, KUET
Homogeneous Distributed DatabaseHomogeneous Distributed Database
Systems (continued)Systems (continued)
31. 31Rushdi Shams, Dept of CSE, KUET
Heterogeneous Distributed DatabaseHeterogeneous Distributed Database
SystemsSystems
Data is distributed across 2 or more systemsData is distributed across 2 or more systems
Those systems’ hardware & softwareThose systems’ hardware & software
configuration is diverseconfiguration is diverse
One site might be running ORACLE under
Windows NT, another site Informix under
UNIX, and yet another site Ingress under
Windows NT
Pretty Cool, huh?
32. 32Rushdi Shams, Dept of CSE, KUET
Heterogeneous Distributed DatabaseHeterogeneous Distributed Database
Systems (continued)Systems (continued)
UNIX
INFORMIX
INGRESS
33. 33Rushdi Shams, Dept of CSE, KUET
Federated Distributed DatabaseFederated Distributed Database
SystemsSystems
Switzerland is a country that is comprised withSwitzerland is a country that is comprised with
several political federationsseveral political federations
These federations are autonomous and politicalThese federations are autonomous and political
unitsunits
The national level decisions are made byThe national level decisions are made by
combining their own decisionscombining their own decisions
A federated database system is made up of a
number of relatively independent, autonomous
databases
34. 34Rushdi Shams, Dept of CSE, KUET
Federated Distributed DatabaseFederated Distributed Database
Systems (continued)Systems (continued)
35. 35Rushdi Shams, Dept of CSE, KUET
Centralized DBMS vsCentralized DBMS vs
Distributed DBMSDistributed DBMS
The system catalogue of a distributed database
has to be more complex. For instance, it has to
store details about the location of fragments and
replicates
Concurrency problems are multiplied in
distributed systems. The problems of
propagating updates to a series of different sites
are very involved
36. 36Rushdi Shams, Dept of CSE, KUET
Centralized DBMS vsCentralized DBMS vs
Distributed DBMS (continued)Distributed DBMS (continued)
A query optimiser in a true distributed system
should be able to utilise information about the
structure of the network in deciding how best to
satisfy a given query
To ensure a robust system, the distributed
DBMS should not be located solely at one site.
Software as well as data need to be distributed
37. 37Rushdi Shams, Dept of CSE, KUET
Implementation Phase of DistributedImplementation Phase of Distributed
DBMSDBMS
1. In the first phase we distribute queries between sites
but update only to a single site
2. In the second phase we not only distribute queries,
we also distribute transactions between sites.
The latter scenario is clearly the more technically
challenging of the two
Most existing distributed database systems are in
phase 1
Very few organisations seem to have solved all of the
problems associated with phase 2 applications
38. 38Rushdi Shams, Dept of CSE, KUET
ReferencesReferences
www.wikipedia.orgwww.wikipedia.org
Database Systems by Paul Beynon-Devies,Database Systems by Paul Beynon-Devies,
Palgrave Macmillan, 2004Palgrave Macmillan, 2004
www.cs.uga.edu/~tartir/classes/8370/FDBS.htmlwww.cs.uga.edu/~tartir/classes/8370/FDBS.html
Distributed Database Design by Fabio Porto, DatabaseDistributed Database Design by Fabio Porto, Database
LaboratoryLaboratory
John hall, Senior Lecturer, University of Bolton, UnitedJohn hall, Senior Lecturer, University of Bolton, United
KingdomKingdom