SlideShare uma empresa Scribd logo
1 de 8
1
MC0077 – Advanced Database Systems
Question 1- List and explain various Normal Forms. How BCNF differs from the Third
Normal Form and 4th Normal forms?
First Normal Form - First normal form (1NF) is a property of a relation in a relational
database. A relation is in first normal form if the domain of each attribute contains only
atomic values, and the value of each attribute contains only a single value from that domain.
First normal form is an essential property of a relation in a relational database. Database
normalization is the process of representing a database in terms of relations in standard
normal forms, where first normal is a minimal requirement. First normal form deals with the
"shape" of a record type. Under first normal form, all occurrences of a record type must
contain the same number of fields. First normal form excludes variable repeating fields and
groups.
Second Normal Form - Second normal form (2NF) is a normal form used in database
normalization. A table that is in first normal form (1NF) must meet additional criteria if it is to
qualify for second normal form. Specifically: a table is in 2NF if and only if it is in 1NF and no
non-prime attribute is dependent on any proper subset of any candidate key of the table. A
non-prime attribute of a table is an attribute that is not a part of any candidate key of the
table. Put simply, a table is in 2NF if and only if it is in 1NF and every non-prime attribute of
the table is either dependent on the whole of a candidate key, or on another non-prime
attribute. When a 1NF table has no composite candidate keys (candidate keys consisting of
more than one attribute), the table is automatically in 2NF. Second and third normal forms
deal with the relationship between non-key and key fields.
Third normal form - Third normal form is a normal form used in database normalization. A
table is in 3NF if and only if both of the following conditions hold: The relation R (table) is in
second normal form (2NF), every non-prime attribute of R is non-transitively dependent (i.e.
directly dependent) on every super key of R.
Fourth Normal form - Under the fourth normal form, a table cannot have more than one multi
valued column. A multivalve column is one where a single entity can have more than one
attribute for that column.
Fifth Normal Form - Fifth normal form deals with cases where information can be
reconstructed from smaller pieces of information that can be maintained with less
redundancy. Second, third, and fourth normal forms also serve this purpose, but fifth normal
form generalizes to cases not covered by the others. The fifth normal form is created by
removing any columns that can be created from smaller pieces of data that can be
maintained with less redundancy.
Difference between BCNF and Third Normal Form
Both 3NF and BCNF are normal forms that are used in relational databases to minimize
redundancies in tables. In a table that is in the BCNF normal form, for every non-trivial
2
functional dependency of the form A → B, A is a super-key whereas, a table that complies
with 3NF should be in the 2NF, and every non-prime attribute should directly depend on
every candidate key of that table. BCNF is considered as a stronger normal form than the
3NF and it was developed to capture some of the anomalies that could not be captured by
3NF. Obtaining a table that complies with the BCNF form will require decomposing a table
that is in the 3NF. This decomposition will result in additional join operations (or Cartesian
products) when executing queries. This will increase the computational time. On the other
hand, the tables that comply with BCNF would have fewer redundancies than tables that
only comply with 3NF.
Difference between BCNF and 4th Normal Form
● Database must be already achieved to 3NF to take it to BCNF, but database must be
in 3NF and BCNF, to reach 4NF.
● In fourth normal form, there are no multi-valued dependencies of the tables, but in
BCNF, there can be multi-valued dependency data in the tables.
Question 2 - What are differences in Centralized and Distributed Database Systems? List
the relative advantages of data distribution.
A distributed database is a database that is under the control of a central database
management system (DBMS) in which storage devices are not all attached to a common
CPU. It may be stored in multiple computers located in the same physical location, or may
be dispersed over a network of interconnected computers. Collections of data (e.g. in a
database) can be distributed across multiple physical locations. A distributed database can
reside on network servers on the Internet, on corporate intranets or extranets, or on other
company networks. The replication and distribution of databases improves database
performance at end-user worksites. To ensure that the distributive databases are up to date
and current, there are two processes: replication and duplication. Replication involves using
specialized software that looks for changes in the distributive database. Once the changes
have been identified, the replication process makes all the databases look the same. The
replication process can be very complex and time consuming depending on the size and
number of the distributive databases. This process can also require a lot of time and
computer resources. Duplication on the other hand is not as complicated. It basically
identifies one database as a master and then duplicates that database. The duplication
process is normally done at a set time after hours. This is to ensure that each distributed
location has the same data. In the duplication process, changes to the master database only
are allowed. This is to ensure that local data will not be overwritten. Both of the processes
can keep the data current in all distributive locations. Besides distributed database
replication and fragmentation, there are many other distributed database design
technologies. For example, local autonomy, synchronous and asynchronous distributed
database technologies. These technologies' implementation can and does depend on the
needs of the business and the sensitivity/confidentiality of the data to be stored in the
database, and hence the price the business is willing to spend on ensuring data security,
consistency and integrity.
A database User accesses the distributed database through:
Local applications: Applications which do not require data from other sites.
3
Global applications: Applications which do require data from other sites.
A distributed database does not share main memory or disks. A centralized database has
all its data on one place, as it is totally different from distributed database which has data on
different places. In centralized database as all the data reside on one place so problem of
bottle-neck can occur, and data availability is not efficient as in distributed database.
Advantages of Data Distribution
The primary advantage of distributed database systems is the ability to share and access
data in a reliable and efficient manner.
1. Data sharing and Distributed Control: If a number of different sites are connected to each
other, then a user at one site may be able to access data that is available at another site. For
example, in the distributed banking system, it is possible for a user in one branch to access
data in another branch. Without this capability, a user wishing to transfer funds from one
branch to another would have to resort to some external mechanism for such a transfer. This
external mechanism would, in effect, be a single centralized database. The primary
advantage to accomplishing data sharing by means of data distribution is that each site is
able to retain a degree of control over data stored locally. In a centralized system, the
database administrator of the central site controls the database. In a distributed system,
there is a global database administrator responsible for the entire system. A part of these
responsibilities is delegated to the local database administrator for each site. Depending
upon the design of the distributed database system, each local administrator may have a
different degree of autonomy which is often a major advantage of distributed databases.
2. Reliability and Availability: If one site fails in distributed system, the remaining sited may be
able to continue operating. In particular, if data are replicated in several sites, transaction
needing a particular data item may find it in several sites. Thus, the failure of a site does not
necessarily imply the shutdown of the system. The failure of one site must be detected by
the system, and appropriate action may be needed to recover from the failure. The system
must no longer use the service of the failed site. Finally, when the failed site recovers or is
repaired, mechanisms must be available to integrate it smoothly back into the system.
Although recovery from failure is more complex in distributed systems than in a centralized
system, the ability of most of the systems to continue to operate despite failure of one site,
results in increased availability. Availability is crucial for database systems used for real-time
applications.
3. Speedup Query Processing: If a query involves data at several sites, it may be possible to
split the query into sub queries that can be executed in parallel by several sites. Such
parallel computation allows for faster processing of a user’s query. In those cases in which
data is replicated, queries may be directed by the system to the least heavily loaded sites.
Question 3 - Describe the concepts of Structural Semantic Data Model (SSM).
A data model in software engineering is an abstract model that describes how data
are represented and accessed. Data models formally define data elements and relationships
among data elements for a domain of interest. A data model explicitly determines the
structure of data or structured data. Typical applications of data models include database
models, design of information systems, and enabling exchange of data. Usually data models
are specified in a data modeling language. Communication and precision are the two key
benefits that make a data model important to applications that use and exchange data. A
4
data model is the medium which project team members from different backgrounds and with
different levels of experience can communicate with one another. Precision means that the
terms and rules on a data model can be interpreted only one way and are not ambiguous. A
data model can be sometimes referred to as a data structure, especially in the context of
programming languages. Data models are often complemented by function models,
especially in the context of enterprise models.
A semantic data model in software engineering is a technique to define the meaning of data
within the context of its interrelationships with other data. A semantic data model is an
abstraction which defines how the stored symbols relate to the real world. A semantic data
model is sometimes called a conceptual data model. The logical data structure of a database
management system (DBMS), whether hierarchical, network, or relational, cannot totally
satisfy the requirements for a conceptual definition of data because it is limited in scope and
biased toward the implementation strategy employed by the DBMS. Therefore, the need to
define data from a conceptual view has led to the development of semantic data modeling
techniques. That is, techniques to define the meaning of data within the context of its
interrelationships with other data. As illustrated in the figure. The real world, in terms of
resources, ideas, events, etc., is symbolically defined within physical data stores. A semantic
data model is an abstraction which defines how the stored symbols relate to the real world.
Thus, the model must be a true representation of the real world
Data modeling in software engineering is the process of creating a data model by applying
formal data model descriptions using data modeling techniques. Data modeling is a
technique for defining business requirements for a database. It is sometimes called
database modeling because a data model is eventually implemented in a database. Data
architecture is the design of data for use in defining the target state and the subsequent
planning needed to hit the target state. It is usually one of several architecture domains that
form the pillars of an enterprise architecture or solution architecture. Data architecture
describes the data structures used by a business and/or its applications. There are
descriptions of data in storage and data in motion; descriptions of data stores, data groups
and data items; and mappings of those data artifacts to data qualities, applications, locations
etc. Essential to realizing the target state, Data architecture describes how data is
processed, stored, and utilized in a given system. It provides criteria for data processing
operations that make it possible to design data flows and also control the flow of data in the
system.
Question 4 - Describe the following with respect to Object Oriented Databases: a) Query
Processing in Object-Oriented Database Systems b) Query Processing Architecture
a. Query Processing in Object-Oriented Database Systems
One of the criticisms of first-generation object-oriented database management systems
(OODBMSs) was their lack of declarative query capabilities. This led some researchers to
brand first generation (network and hierarchical) DBMSs as object-oriented. It was
commonly believed that the application domains that OODBMS technology targets do not
need querying capabilities. This belief no longer holds, and declarative query capability is
accepted as one of the fundamental features of OO-DBMS. Indeed, most of the current
prototype systems experiment with powerful query languages and investigate their
5
optimization. Commercial products have started to include such languages as well e.g. O2
and Object-Store.
Query optimization techniques are dependent upon the query model and language. For
example, a functional query language lends itself to functional optimization which is quite
different from the algebraic, cost-based optimization techniques employed in relational as
well as a number of object-oriented systems. The query model, in turn, is based on the data
(or object) model since the latter defines the access primitives which are used by the query
model. These primitives, at least partially, determine the power of the query model. Despite
this close relationship, in this unit we do not consider issues related to the design of object
models, query models, or query languages in any detail.
Almost all object query processors proposed to date use optimization techniques developed
for relational systems. However, there are a number of issues that make query processing
more difficult in OODBMSs. The following are some of the more important issues:
Type System - Relational query languages operate on a simple type system consisting of a
single aggregate type: relation. The closure property of relational languages implies that
each relational operator takes one or more relations as operands and produces a relation as
a result. In contrast, object systems have richer type systems. The results of object algebra
operators are usually sets of objects (or collections) whose members may be of different
types. If the object languages are closed under the algebra operators, these heterogeneous
sets of objects can be operands to other operators.
Encapsulation - Relational query optimization depends on knowledge of the physical storage
of data (access paths) which is readily available to the query optimizer. The encapsulation of
methods with the data that they operate on in OODBMSs raises (at least) two issues. First,
estimating the cost of executing methods is considerably more difficult than estimating the
cost of accessing an attribute according to an access path. In fact, optimizers have to worry
about optimizing method execution, which is not an easy problem because methods may be
written using a general-purpose programming language. Second, encapsulation raises
issues related to the accessibility of storage information by the query optimizer. Some
systems overcome this difficulty by treating the query optimizer as a special application that
can break encapsulation and access information directly.
Complex Objects and Inheritance - Objects usually have complex structures where the state
of an object references other objects. Accessing such complex objects involves path
expressions. The optimization of path expressions is a difficult and central issue in object
query languages.
Object Models - OODBMSs lack a universally accepted object model definition. Even though
there is some consensus on the basic features that need to be supported by any object
model (e.g., object identity, encapsulation of state and behavior, type inheritance, and typed
collections), how these features are supported differs among models and systems. As a
result, the numerous projects that experiment with object query processing follow quite
different paths and are, to a certain degree, incompatible, making it difficult to amortize on
the experiences of others.
6
b. Query Processing Architecture
A query processing methodology similar to relational DBMSs, but modified to deal with the
difficulties,
The steps of the methodology are as follows.
1. Queries are expressed in a declarative language
2. It requires no user knowledge of object implementations, access paths or
processing strategies
3. The calculus expression is first
4. Calculus Optimization
5. Calculus Algebra Transformation
6. Type check
7. Algebra Optimization
8. Execution Plan Generation
9. Execution
Question 5 - Describe the Differences between Distributed & Centralized Databases.
1 Centralized Control vs. Decentralized Control - In centralized control one "database
administrator" ensures safety of data whereas in distributed control, it is possible to use
hierarchical control structure based on a "global database administrator" having the central
responsibility of whole data along with "local database administrators", who have the
responsibility of local databases.
2 Data Independence - In central databases it means the actual organization of data is
transparent to the application programmer. The programs are written with "conceptual" view
of the data (called "Conceptual schema"), and the programs are unaffected by physical
organization of data. In Distributed Databases, another aspect of "distribution dependency"
is added to the notion of data independence as used in Centralized databases. Distribution
Dependency means programs are written assuming the data is not distributed. Thus
correctness of programs is unaffected by the movement of data from one site to another;
however, their speed of execution is affected.
3 Reduction of Redundancy - In centralized databases redundancy was reduced for two
reasons :(a) inconsistencies among several copies of the same logical data are avoided, (b)
storage space is saved. Reduction of redundancy is obtained by data sharing. In distributed
databases data redundancy is desirable as (a) locality of applications can be increased if
data is replicated at all sites where applications need it, (b) the availability of the system can
be increased, because a site failure does not stop the execution of applications at other sites
if the data is replicated. With data replication, retrieval can be performed on any copy, while
updates must be performed consistently on all copies.
4 Complex Physical Structures and Efficient Access - In centralized databases complex
accessing structures like secondary indexed, interfile chains are used. All these features
provide efficient access to data. In distributed databases efficient access requires accessing
7
data from different sites. For this an efficient distributed data access plan is required which
can be generated either by the programmer or produced automatically by an optimizer.
Problems faced in the design of an optimizer can be classified in two categories: a) Global
optimization consists of determining which data must be accessed at which sites and which
data files must consequently be transmitted between sites. b) Local optimization consists of
deciding how to perform the local database accesses at each site.
5 Integrity, Recovery and Concurrency Control - A transaction is an atomic unit of execution
and atomic transactions are the means to obtain database integrity. Failures and
concurrency are two dangers of atomicity. Failures may cause the system to stop in midst of
transaction execution, thus violating the atomicity requirement. Concurrent execution of
different transactions may permit one transaction to observe an inconsistent, transient state
created by another transaction during its execution. Concurrent execution requires
synchronization amongst the transactions, which is much harder in all distributed systems.
6 Privacy and Security - In traditional databases, the database administrator, having
centralized control, can ensure that only authorized access to the data is performed. In
distributed databases, local administrators face the same as well as two new aspects of the
problem; (a) security (protection) problems because of communication networks is intrinsic
to database systems. (b) In certain databases with a high degree of "site autonomy" may
feel more protected because they can enforce their own protections instead of depending on
a central database administrator.
7 Distributed Query Processing - The DDBMS should be capable of gathering and presenting
data from more than one site to answer a single query. In theory a distributed system can
handle queries more quickly than a centralized one, by exploiting parallelism and reducing
disc contention; in practice the main delays (and costs) will be imposed by the
communications network. Routing algorithms must take many factors into account to
determine the location and ordering of operations. Communications costs for each link in the
network are relevant, as also are variable processing capabilities and loadings for different
nodes, and (where data fragments are replicated) trade-offs between cost and currency.
8 Distributed Directory (Catalog) Management - Catalogs for distributed databases contain
information like fragmentation description, allocation description, mappings to local names,
access method description, statistics on the database, protection and integrity constraints
(consistency information) which are more detailed as compared to centralized databases.
Question 6 - Describe the following: a) Data Mining Functions b) Data Mining Techniques
a) Data Mining Functions
Data mining refers to the broadly-defined set of techniques involving finding meaningful
patterns - or information - in large amounts of raw data. At a very high level, data mining is
performed in the following stages (note that terminology and steps taken in the data mining
process varies by data mining practitioner):
1. Data collection: gathering the input data you intend to analyze
2. Data scrubbing: removing missing records, filling in missing values where appropriate
8
3. Pre-testing: determining which variables might be important for inclusion during the
analysis stage.
4. Analysis/Training: analyzing the input data to look for patterns
5. Model building: drawing conclusions from the analysis phase and determining a
mathematical model to be applied to future sets of input data
6. Application: applying the model to new data sets to find meaningful patterns
Data mining can be used to classify or cluster data into groups or to predict likely future
outcomes based upon a set of input variables/data.
b) Data Mining Techniques
There are several major data mining techniques have been developed and used in data
mining projects.
Association - Association is one of the best known data mining technique. In association, a
pattern is discovered based on a relationship of a particular item on other items in the same
transaction. For example, the association technique is used in market basket analysis to
identify what products that customers frequently purchase together.
Classification - Classification is a classic data mining technique based on machine learning.
Basically classification is used to classify each item in a set of data into one of predefined set
of classes or groups.
Clustering - Clustering is a data mining technique that makes meaningful or useful cluster of
objects that have similar characteristic using automatic technique. Different from
classification, clustering technique also defines the classes and put objects in them, while in
classification objects are assigned into predefined classes.
Prediction - The prediction as it name implied is one of a data mining techniques that
discovers relationship between independent variables and relationship between dependent
and independent variables
Sequential Patterns - Sequential patterns analysis in one of data mining technique that
seeks to discover similar patterns in data transaction over a business period. The uncover
patterns are used for further business analysis to recognize relationships among data.
Artificial neural networks - These are non-linear, predictive models that learn through
training. Although they are powerful predictive modeling techniques, some of the power
comes at the expense of ease of use and deployment.
Decision trees - They are tree-shaped structures that represent decision sets. These
decisions generate rules, which then are used to classify data. Decision trees are the
favored technique for building understandable models.
The nearest-neighbor method - This method classifies dataset records based on similar data
in a historical dataset.

Mais conteúdo relacionado

Mais procurados

DBMS - Relational Model
DBMS - Relational ModelDBMS - Relational Model
DBMS - Relational ModelOvais Imtiaz
 
Database Systems - introduction
Database Systems - introductionDatabase Systems - introduction
Database Systems - introductionJananath Banuka
 
Introduction to DBMS(For College Seminars)
Introduction to DBMS(For College Seminars)Introduction to DBMS(For College Seminars)
Introduction to DBMS(For College Seminars)Naman Joshi
 
Mca ii-dbms- u-ii-the relational database model
Mca ii-dbms- u-ii-the relational database modelMca ii-dbms- u-ii-the relational database model
Mca ii-dbms- u-ii-the relational database modelRai University
 
Database Concept by Luke Lonergan
Database Concept by Luke LonerganDatabase Concept by Luke Lonergan
Database Concept by Luke LonerganLuke Lonergan
 
Normalization
NormalizationNormalization
Normalizationochesing
 
An extended database reverse engineering – a key for database forensic invest...
An extended database reverse engineering – a key for database forensic invest...An extended database reverse engineering – a key for database forensic invest...
An extended database reverse engineering – a key for database forensic invest...eSAT Publishing House
 
P REFIX - BASED L ABELING A NNOTATION FOR E FFECTIVE XML F RAGMENTATION
P REFIX - BASED  L ABELING  A NNOTATION FOR  E FFECTIVE  XML F RAGMENTATIONP REFIX - BASED  L ABELING  A NNOTATION FOR  E FFECTIVE  XML F RAGMENTATION
P REFIX - BASED L ABELING A NNOTATION FOR E FFECTIVE XML F RAGMENTATIONijcsit
 
Introduction to DBMS and SQL Overview
Introduction to DBMS and SQL OverviewIntroduction to DBMS and SQL Overview
Introduction to DBMS and SQL OverviewPrabu U
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbmsAnjaan Gajendra
 
DBMS topics for BCA
DBMS topics for BCADBMS topics for BCA
DBMS topics for BCAAdbay
 
NORMALIZATION - BIS 1204: Data and Information Management I
NORMALIZATION - BIS 1204: Data and Information Management I NORMALIZATION - BIS 1204: Data and Information Management I
NORMALIZATION - BIS 1204: Data and Information Management I Mukalele Rogers
 
DBMS VIVA QUESTIONS_CODERS LODGE.pdf
DBMS VIVA QUESTIONS_CODERS LODGE.pdfDBMS VIVA QUESTIONS_CODERS LODGE.pdf
DBMS VIVA QUESTIONS_CODERS LODGE.pdfnofakeNews
 
Introduction to databases
Introduction to databasesIntroduction to databases
Introduction to databasesBryan Corpuz
 
Week 1 Lab Directions
Week 1 Lab DirectionsWeek 1 Lab Directions
Week 1 Lab Directionsoudesign
 
Chapter 6 Database SC025 2017/2018
Chapter 6 Database SC025 2017/2018Chapter 6 Database SC025 2017/2018
Chapter 6 Database SC025 2017/2018Fizaril Amzari Omar
 
Relational Model in dbms & sql database
Relational Model in dbms & sql databaseRelational Model in dbms & sql database
Relational Model in dbms & sql databasegourav kottawar
 

Mais procurados (20)

DBMS - Relational Model
DBMS - Relational ModelDBMS - Relational Model
DBMS - Relational Model
 
Database Systems - introduction
Database Systems - introductionDatabase Systems - introduction
Database Systems - introduction
 
Introduction to DBMS(For College Seminars)
Introduction to DBMS(For College Seminars)Introduction to DBMS(For College Seminars)
Introduction to DBMS(For College Seminars)
 
Mca ii-dbms- u-ii-the relational database model
Mca ii-dbms- u-ii-the relational database modelMca ii-dbms- u-ii-the relational database model
Mca ii-dbms- u-ii-the relational database model
 
Database Concept by Luke Lonergan
Database Concept by Luke LonerganDatabase Concept by Luke Lonergan
Database Concept by Luke Lonergan
 
Normalization
NormalizationNormalization
Normalization
 
An extended database reverse engineering – a key for database forensic invest...
An extended database reverse engineering – a key for database forensic invest...An extended database reverse engineering – a key for database forensic invest...
An extended database reverse engineering – a key for database forensic invest...
 
Dbms
DbmsDbms
Dbms
 
P REFIX - BASED L ABELING A NNOTATION FOR E FFECTIVE XML F RAGMENTATION
P REFIX - BASED  L ABELING  A NNOTATION FOR  E FFECTIVE  XML F RAGMENTATIONP REFIX - BASED  L ABELING  A NNOTATION FOR  E FFECTIVE  XML F RAGMENTATION
P REFIX - BASED L ABELING A NNOTATION FOR E FFECTIVE XML F RAGMENTATION
 
Introduction to DBMS and SQL Overview
Introduction to DBMS and SQL OverviewIntroduction to DBMS and SQL Overview
Introduction to DBMS and SQL Overview
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbms
 
DBMS topics for BCA
DBMS topics for BCADBMS topics for BCA
DBMS topics for BCA
 
NORMALIZATION - BIS 1204: Data and Information Management I
NORMALIZATION - BIS 1204: Data and Information Management I NORMALIZATION - BIS 1204: Data and Information Management I
NORMALIZATION - BIS 1204: Data and Information Management I
 
Rdbms
RdbmsRdbms
Rdbms
 
DBMS VIVA QUESTIONS_CODERS LODGE.pdf
DBMS VIVA QUESTIONS_CODERS LODGE.pdfDBMS VIVA QUESTIONS_CODERS LODGE.pdf
DBMS VIVA QUESTIONS_CODERS LODGE.pdf
 
Introduction to databases
Introduction to databasesIntroduction to databases
Introduction to databases
 
Database fundamentals
Database fundamentalsDatabase fundamentals
Database fundamentals
 
Week 1 Lab Directions
Week 1 Lab DirectionsWeek 1 Lab Directions
Week 1 Lab Directions
 
Chapter 6 Database SC025 2017/2018
Chapter 6 Database SC025 2017/2018Chapter 6 Database SC025 2017/2018
Chapter 6 Database SC025 2017/2018
 
Relational Model in dbms & sql database
Relational Model in dbms & sql databaseRelational Model in dbms & sql database
Relational Model in dbms & sql database
 

Destaque

Advanced Sql Training
Advanced Sql TrainingAdvanced Sql Training
Advanced Sql Trainingbixxman
 
MS Sql Server: Advanced Query Concepts
MS Sql Server: Advanced Query ConceptsMS Sql Server: Advanced Query Concepts
MS Sql Server: Advanced Query ConceptsDataminingTools Inc
 
GraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphGraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphNeo4j
 
Webinar: RDBMS to Graphs
Webinar: RDBMS to GraphsWebinar: RDBMS to Graphs
Webinar: RDBMS to GraphsNeo4j
 
15. Transactions in DBMS
15. Transactions in DBMS15. Transactions in DBMS
15. Transactions in DBMSkoolkampus
 

Destaque (6)

advanced sql(database)
advanced sql(database)advanced sql(database)
advanced sql(database)
 
Advanced Sql Training
Advanced Sql TrainingAdvanced Sql Training
Advanced Sql Training
 
MS Sql Server: Advanced Query Concepts
MS Sql Server: Advanced Query ConceptsMS Sql Server: Advanced Query Concepts
MS Sql Server: Advanced Query Concepts
 
GraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphGraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business Graph
 
Webinar: RDBMS to Graphs
Webinar: RDBMS to GraphsWebinar: RDBMS to Graphs
Webinar: RDBMS to Graphs
 
15. Transactions in DBMS
15. Transactions in DBMS15. Transactions in DBMS
15. Transactions in DBMS
 

Semelhante a Database Normal Forms and Distributed Systems

Distributed databases
Distributed databasesDistributed databases
Distributed databasesSuneel Dogra
 
object oriented analysis data.pptx
object oriented analysis data.pptxobject oriented analysis data.pptx
object oriented analysis data.pptxnibiganesh
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management systemPooja Dixit
 
Distributed database. pdf
Distributed database. pdfDistributed database. pdf
Distributed database. pdfSurajGhadge15
 
CIS 515 discussion post responses.There are two discussions he.docx
CIS 515 discussion post responses.There are two discussions he.docxCIS 515 discussion post responses.There are two discussions he.docx
CIS 515 discussion post responses.There are two discussions he.docxsleeperharwell
 
Database Management System
Database Management SystemDatabase Management System
Database Management SystemNishant Munjal
 
database management system - overview of entire dbms
database management system - overview of entire dbmsdatabase management system - overview of entire dbms
database management system - overview of entire dbmsvikramkagitapu
 
Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016Dave Stokes
 
Database systems assignment 1
Database systems   assignment 1Database systems   assignment 1
Database systems assignment 1Nelson Kimathi
 
Data base management system
Data base management systemData base management system
Data base management systemSuneel Dogra
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

Semelhante a Database Normal Forms and Distributed Systems (20)

Distributed databases
Distributed databasesDistributed databases
Distributed databases
 
object oriented analysis data.pptx
object oriented analysis data.pptxobject oriented analysis data.pptx
object oriented analysis data.pptx
 
Ddbms1
Ddbms1Ddbms1
Ddbms1
 
1816 1819
1816 18191816 1819
1816 1819
 
1816 1819
1816 18191816 1819
1816 1819
 
Ch09
Ch09Ch09
Ch09
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
 
Distributed database. pdf
Distributed database. pdfDistributed database. pdf
Distributed database. pdf
 
CIS 515 discussion post responses.There are two discussions he.docx
CIS 515 discussion post responses.There are two discussions he.docxCIS 515 discussion post responses.There are two discussions he.docx
CIS 515 discussion post responses.There are two discussions he.docx
 
Exception & Database
Exception & DatabaseException & Database
Exception & Database
 
Normalization
NormalizationNormalization
Normalization
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
database management system - overview of entire dbms
database management system - overview of entire dbmsdatabase management system - overview of entire dbms
database management system - overview of entire dbms
 
Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016
 
Database systems assignment 1
Database systems   assignment 1Database systems   assignment 1
Database systems assignment 1
 
Advance DBMS
Advance DBMSAdvance DBMS
Advance DBMS
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
Data base management system
Data base management systemData base management system
Data base management system
 
INJRV01I10005.pdf
INJRV01I10005.pdfINJRV01I10005.pdf
INJRV01I10005.pdf
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 

Mais de Aravind NC

MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEAravind NC
 
MC0083 – Object Oriented Analysis &. Design using UML - Master of Computer Sc...
MC0083 – Object Oriented Analysis &. Design using UML - Master of Computer Sc...MC0083 – Object Oriented Analysis &. Design using UML - Master of Computer Sc...
MC0083 – Object Oriented Analysis &. Design using UML - Master of Computer Sc...Aravind NC
 
MC0084 – Software Project Management & Quality Assurance - Master of Computer...
MC0084 – Software Project Management & Quality Assurance - Master of Computer...MC0084 – Software Project Management & Quality Assurance - Master of Computer...
MC0084 – Software Project Management & Quality Assurance - Master of Computer...Aravind NC
 
MC0082 –Theory of Computer Science
MC0082 –Theory of Computer ScienceMC0082 –Theory of Computer Science
MC0082 –Theory of Computer ScienceAravind NC
 
Master of Computer Application (MCA) – Semester 4 MC0080
Master of Computer Application (MCA) – Semester 4  MC0080Master of Computer Application (MCA) – Semester 4  MC0080
Master of Computer Application (MCA) – Semester 4 MC0080Aravind NC
 
Master of Computer Application (MCA) – Semester 4 MC0079
Master of Computer Application (MCA) – Semester 4  MC0079Master of Computer Application (MCA) – Semester 4  MC0079
Master of Computer Application (MCA) – Semester 4 MC0079Aravind NC
 
Master of Computer Application (MCA) – Semester 4 MC0078
Master of Computer Application (MCA) – Semester 4  MC0078Master of Computer Application (MCA) – Semester 4  MC0078
Master of Computer Application (MCA) – Semester 4 MC0078Aravind NC
 
Master of Computer Application (MCA) – Semester 4 MC0076
Master of Computer Application (MCA) – Semester 4  MC0076Master of Computer Application (MCA) – Semester 4  MC0076
Master of Computer Application (MCA) – Semester 4 MC0076Aravind NC
 

Mais de Aravind NC (10)

MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
 
MC0083 – Object Oriented Analysis &. Design using UML - Master of Computer Sc...
MC0083 – Object Oriented Analysis &. Design using UML - Master of Computer Sc...MC0083 – Object Oriented Analysis &. Design using UML - Master of Computer Sc...
MC0083 – Object Oriented Analysis &. Design using UML - Master of Computer Sc...
 
MC0084 – Software Project Management & Quality Assurance - Master of Computer...
MC0084 – Software Project Management & Quality Assurance - Master of Computer...MC0084 – Software Project Management & Quality Assurance - Master of Computer...
MC0084 – Software Project Management & Quality Assurance - Master of Computer...
 
MC0082 –Theory of Computer Science
MC0082 –Theory of Computer ScienceMC0082 –Theory of Computer Science
MC0082 –Theory of Computer Science
 
Master of Computer Application (MCA) – Semester 4 MC0080
Master of Computer Application (MCA) – Semester 4  MC0080Master of Computer Application (MCA) – Semester 4  MC0080
Master of Computer Application (MCA) – Semester 4 MC0080
 
Master of Computer Application (MCA) – Semester 4 MC0079
Master of Computer Application (MCA) – Semester 4  MC0079Master of Computer Application (MCA) – Semester 4  MC0079
Master of Computer Application (MCA) – Semester 4 MC0079
 
Master of Computer Application (MCA) – Semester 4 MC0078
Master of Computer Application (MCA) – Semester 4  MC0078Master of Computer Application (MCA) – Semester 4  MC0078
Master of Computer Application (MCA) – Semester 4 MC0078
 
Master of Computer Application (MCA) – Semester 4 MC0076
Master of Computer Application (MCA) – Semester 4  MC0076Master of Computer Application (MCA) – Semester 4  MC0076
Master of Computer Application (MCA) – Semester 4 MC0076
 
Time travel
Time travelTime travel
Time travel
 
Google x
Google xGoogle x
Google x
 

Último

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Último (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Database Normal Forms and Distributed Systems

  • 1. 1 MC0077 – Advanced Database Systems Question 1- List and explain various Normal Forms. How BCNF differs from the Third Normal Form and 4th Normal forms? First Normal Form - First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain. First normal form is an essential property of a relation in a relational database. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. First normal form deals with the "shape" of a record type. Under first normal form, all occurrences of a record type must contain the same number of fields. First normal form excludes variable repeating fields and groups. Second Normal Form - Second normal form (2NF) is a normal form used in database normalization. A table that is in first normal form (1NF) must meet additional criteria if it is to qualify for second normal form. Specifically: a table is in 2NF if and only if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the table. A non-prime attribute of a table is an attribute that is not a part of any candidate key of the table. Put simply, a table is in 2NF if and only if it is in 1NF and every non-prime attribute of the table is either dependent on the whole of a candidate key, or on another non-prime attribute. When a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF. Second and third normal forms deal with the relationship between non-key and key fields. Third normal form - Third normal form is a normal form used in database normalization. A table is in 3NF if and only if both of the following conditions hold: The relation R (table) is in second normal form (2NF), every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every super key of R. Fourth Normal form - Under the fourth normal form, a table cannot have more than one multi valued column. A multivalve column is one where a single entity can have more than one attribute for that column. Fifth Normal Form - Fifth normal form deals with cases where information can be reconstructed from smaller pieces of information that can be maintained with less redundancy. Second, third, and fourth normal forms also serve this purpose, but fifth normal form generalizes to cases not covered by the others. The fifth normal form is created by removing any columns that can be created from smaller pieces of data that can be maintained with less redundancy. Difference between BCNF and Third Normal Form Both 3NF and BCNF are normal forms that are used in relational databases to minimize redundancies in tables. In a table that is in the BCNF normal form, for every non-trivial
  • 2. 2 functional dependency of the form A → B, A is a super-key whereas, a table that complies with 3NF should be in the 2NF, and every non-prime attribute should directly depend on every candidate key of that table. BCNF is considered as a stronger normal form than the 3NF and it was developed to capture some of the anomalies that could not be captured by 3NF. Obtaining a table that complies with the BCNF form will require decomposing a table that is in the 3NF. This decomposition will result in additional join operations (or Cartesian products) when executing queries. This will increase the computational time. On the other hand, the tables that comply with BCNF would have fewer redundancies than tables that only comply with 3NF. Difference between BCNF and 4th Normal Form ● Database must be already achieved to 3NF to take it to BCNF, but database must be in 3NF and BCNF, to reach 4NF. ● In fourth normal form, there are no multi-valued dependencies of the tables, but in BCNF, there can be multi-valued dependency data in the tables. Question 2 - What are differences in Centralized and Distributed Database Systems? List the relative advantages of data distribution. A distributed database is a database that is under the control of a central database management system (DBMS) in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers. Collections of data (e.g. in a database) can be distributed across multiple physical locations. A distributed database can reside on network servers on the Internet, on corporate intranets or extranets, or on other company networks. The replication and distribution of databases improves database performance at end-user worksites. To ensure that the distributive databases are up to date and current, there are two processes: replication and duplication. Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be very complex and time consuming depending on the size and number of the distributive databases. This process can also require a lot of time and computer resources. Duplication on the other hand is not as complicated. It basically identifies one database as a master and then duplicates that database. The duplication process is normally done at a set time after hours. This is to ensure that each distributed location has the same data. In the duplication process, changes to the master database only are allowed. This is to ensure that local data will not be overwritten. Both of the processes can keep the data current in all distributive locations. Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is willing to spend on ensuring data security, consistency and integrity. A database User accesses the distributed database through: Local applications: Applications which do not require data from other sites.
  • 3. 3 Global applications: Applications which do require data from other sites. A distributed database does not share main memory or disks. A centralized database has all its data on one place, as it is totally different from distributed database which has data on different places. In centralized database as all the data reside on one place so problem of bottle-neck can occur, and data availability is not efficient as in distributed database. Advantages of Data Distribution The primary advantage of distributed database systems is the ability to share and access data in a reliable and efficient manner. 1. Data sharing and Distributed Control: If a number of different sites are connected to each other, then a user at one site may be able to access data that is available at another site. For example, in the distributed banking system, it is possible for a user in one branch to access data in another branch. Without this capability, a user wishing to transfer funds from one branch to another would have to resort to some external mechanism for such a transfer. This external mechanism would, in effect, be a single centralized database. The primary advantage to accomplishing data sharing by means of data distribution is that each site is able to retain a degree of control over data stored locally. In a centralized system, the database administrator of the central site controls the database. In a distributed system, there is a global database administrator responsible for the entire system. A part of these responsibilities is delegated to the local database administrator for each site. Depending upon the design of the distributed database system, each local administrator may have a different degree of autonomy which is often a major advantage of distributed databases. 2. Reliability and Availability: If one site fails in distributed system, the remaining sited may be able to continue operating. In particular, if data are replicated in several sites, transaction needing a particular data item may find it in several sites. Thus, the failure of a site does not necessarily imply the shutdown of the system. The failure of one site must be detected by the system, and appropriate action may be needed to recover from the failure. The system must no longer use the service of the failed site. Finally, when the failed site recovers or is repaired, mechanisms must be available to integrate it smoothly back into the system. Although recovery from failure is more complex in distributed systems than in a centralized system, the ability of most of the systems to continue to operate despite failure of one site, results in increased availability. Availability is crucial for database systems used for real-time applications. 3. Speedup Query Processing: If a query involves data at several sites, it may be possible to split the query into sub queries that can be executed in parallel by several sites. Such parallel computation allows for faster processing of a user’s query. In those cases in which data is replicated, queries may be directed by the system to the least heavily loaded sites. Question 3 - Describe the concepts of Structural Semantic Data Model (SSM). A data model in software engineering is an abstract model that describes how data are represented and accessed. Data models formally define data elements and relationships among data elements for a domain of interest. A data model explicitly determines the structure of data or structured data. Typical applications of data models include database models, design of information systems, and enabling exchange of data. Usually data models are specified in a data modeling language. Communication and precision are the two key benefits that make a data model important to applications that use and exchange data. A
  • 4. 4 data model is the medium which project team members from different backgrounds and with different levels of experience can communicate with one another. Precision means that the terms and rules on a data model can be interpreted only one way and are not ambiguous. A data model can be sometimes referred to as a data structure, especially in the context of programming languages. Data models are often complemented by function models, especially in the context of enterprise models. A semantic data model in software engineering is a technique to define the meaning of data within the context of its interrelationships with other data. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. A semantic data model is sometimes called a conceptual data model. The logical data structure of a database management system (DBMS), whether hierarchical, network, or relational, cannot totally satisfy the requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. Therefore, the need to define data from a conceptual view has led to the development of semantic data modeling techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure. The real world, in terms of resources, ideas, events, etc., is symbolically defined within physical data stores. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world Data modeling in software engineering is the process of creating a data model by applying formal data model descriptions using data modeling techniques. Data modeling is a technique for defining business requirements for a database. It is sometimes called database modeling because a data model is eventually implemented in a database. Data architecture is the design of data for use in defining the target state and the subsequent planning needed to hit the target state. It is usually one of several architecture domains that form the pillars of an enterprise architecture or solution architecture. Data architecture describes the data structures used by a business and/or its applications. There are descriptions of data in storage and data in motion; descriptions of data stores, data groups and data items; and mappings of those data artifacts to data qualities, applications, locations etc. Essential to realizing the target state, Data architecture describes how data is processed, stored, and utilized in a given system. It provides criteria for data processing operations that make it possible to design data flows and also control the flow of data in the system. Question 4 - Describe the following with respect to Object Oriented Databases: a) Query Processing in Object-Oriented Database Systems b) Query Processing Architecture a. Query Processing in Object-Oriented Database Systems One of the criticisms of first-generation object-oriented database management systems (OODBMSs) was their lack of declarative query capabilities. This led some researchers to brand first generation (network and hierarchical) DBMSs as object-oriented. It was commonly believed that the application domains that OODBMS technology targets do not need querying capabilities. This belief no longer holds, and declarative query capability is accepted as one of the fundamental features of OO-DBMS. Indeed, most of the current prototype systems experiment with powerful query languages and investigate their
  • 5. 5 optimization. Commercial products have started to include such languages as well e.g. O2 and Object-Store. Query optimization techniques are dependent upon the query model and language. For example, a functional query language lends itself to functional optimization which is quite different from the algebraic, cost-based optimization techniques employed in relational as well as a number of object-oriented systems. The query model, in turn, is based on the data (or object) model since the latter defines the access primitives which are used by the query model. These primitives, at least partially, determine the power of the query model. Despite this close relationship, in this unit we do not consider issues related to the design of object models, query models, or query languages in any detail. Almost all object query processors proposed to date use optimization techniques developed for relational systems. However, there are a number of issues that make query processing more difficult in OODBMSs. The following are some of the more important issues: Type System - Relational query languages operate on a simple type system consisting of a single aggregate type: relation. The closure property of relational languages implies that each relational operator takes one or more relations as operands and produces a relation as a result. In contrast, object systems have richer type systems. The results of object algebra operators are usually sets of objects (or collections) whose members may be of different types. If the object languages are closed under the algebra operators, these heterogeneous sets of objects can be operands to other operators. Encapsulation - Relational query optimization depends on knowledge of the physical storage of data (access paths) which is readily available to the query optimizer. The encapsulation of methods with the data that they operate on in OODBMSs raises (at least) two issues. First, estimating the cost of executing methods is considerably more difficult than estimating the cost of accessing an attribute according to an access path. In fact, optimizers have to worry about optimizing method execution, which is not an easy problem because methods may be written using a general-purpose programming language. Second, encapsulation raises issues related to the accessibility of storage information by the query optimizer. Some systems overcome this difficulty by treating the query optimizer as a special application that can break encapsulation and access information directly. Complex Objects and Inheritance - Objects usually have complex structures where the state of an object references other objects. Accessing such complex objects involves path expressions. The optimization of path expressions is a difficult and central issue in object query languages. Object Models - OODBMSs lack a universally accepted object model definition. Even though there is some consensus on the basic features that need to be supported by any object model (e.g., object identity, encapsulation of state and behavior, type inheritance, and typed collections), how these features are supported differs among models and systems. As a result, the numerous projects that experiment with object query processing follow quite different paths and are, to a certain degree, incompatible, making it difficult to amortize on the experiences of others.
  • 6. 6 b. Query Processing Architecture A query processing methodology similar to relational DBMSs, but modified to deal with the difficulties, The steps of the methodology are as follows. 1. Queries are expressed in a declarative language 2. It requires no user knowledge of object implementations, access paths or processing strategies 3. The calculus expression is first 4. Calculus Optimization 5. Calculus Algebra Transformation 6. Type check 7. Algebra Optimization 8. Execution Plan Generation 9. Execution Question 5 - Describe the Differences between Distributed & Centralized Databases. 1 Centralized Control vs. Decentralized Control - In centralized control one "database administrator" ensures safety of data whereas in distributed control, it is possible to use hierarchical control structure based on a "global database administrator" having the central responsibility of whole data along with "local database administrators", who have the responsibility of local databases. 2 Data Independence - In central databases it means the actual organization of data is transparent to the application programmer. The programs are written with "conceptual" view of the data (called "Conceptual schema"), and the programs are unaffected by physical organization of data. In Distributed Databases, another aspect of "distribution dependency" is added to the notion of data independence as used in Centralized databases. Distribution Dependency means programs are written assuming the data is not distributed. Thus correctness of programs is unaffected by the movement of data from one site to another; however, their speed of execution is affected. 3 Reduction of Redundancy - In centralized databases redundancy was reduced for two reasons :(a) inconsistencies among several copies of the same logical data are avoided, (b) storage space is saved. Reduction of redundancy is obtained by data sharing. In distributed databases data redundancy is desirable as (a) locality of applications can be increased if data is replicated at all sites where applications need it, (b) the availability of the system can be increased, because a site failure does not stop the execution of applications at other sites if the data is replicated. With data replication, retrieval can be performed on any copy, while updates must be performed consistently on all copies. 4 Complex Physical Structures and Efficient Access - In centralized databases complex accessing structures like secondary indexed, interfile chains are used. All these features provide efficient access to data. In distributed databases efficient access requires accessing
  • 7. 7 data from different sites. For this an efficient distributed data access plan is required which can be generated either by the programmer or produced automatically by an optimizer. Problems faced in the design of an optimizer can be classified in two categories: a) Global optimization consists of determining which data must be accessed at which sites and which data files must consequently be transmitted between sites. b) Local optimization consists of deciding how to perform the local database accesses at each site. 5 Integrity, Recovery and Concurrency Control - A transaction is an atomic unit of execution and atomic transactions are the means to obtain database integrity. Failures and concurrency are two dangers of atomicity. Failures may cause the system to stop in midst of transaction execution, thus violating the atomicity requirement. Concurrent execution of different transactions may permit one transaction to observe an inconsistent, transient state created by another transaction during its execution. Concurrent execution requires synchronization amongst the transactions, which is much harder in all distributed systems. 6 Privacy and Security - In traditional databases, the database administrator, having centralized control, can ensure that only authorized access to the data is performed. In distributed databases, local administrators face the same as well as two new aspects of the problem; (a) security (protection) problems because of communication networks is intrinsic to database systems. (b) In certain databases with a high degree of "site autonomy" may feel more protected because they can enforce their own protections instead of depending on a central database administrator. 7 Distributed Query Processing - The DDBMS should be capable of gathering and presenting data from more than one site to answer a single query. In theory a distributed system can handle queries more quickly than a centralized one, by exploiting parallelism and reducing disc contention; in practice the main delays (and costs) will be imposed by the communications network. Routing algorithms must take many factors into account to determine the location and ordering of operations. Communications costs for each link in the network are relevant, as also are variable processing capabilities and loadings for different nodes, and (where data fragments are replicated) trade-offs between cost and currency. 8 Distributed Directory (Catalog) Management - Catalogs for distributed databases contain information like fragmentation description, allocation description, mappings to local names, access method description, statistics on the database, protection and integrity constraints (consistency information) which are more detailed as compared to centralized databases. Question 6 - Describe the following: a) Data Mining Functions b) Data Mining Techniques a) Data Mining Functions Data mining refers to the broadly-defined set of techniques involving finding meaningful patterns - or information - in large amounts of raw data. At a very high level, data mining is performed in the following stages (note that terminology and steps taken in the data mining process varies by data mining practitioner): 1. Data collection: gathering the input data you intend to analyze 2. Data scrubbing: removing missing records, filling in missing values where appropriate
  • 8. 8 3. Pre-testing: determining which variables might be important for inclusion during the analysis stage. 4. Analysis/Training: analyzing the input data to look for patterns 5. Model building: drawing conclusions from the analysis phase and determining a mathematical model to be applied to future sets of input data 6. Application: applying the model to new data sets to find meaningful patterns Data mining can be used to classify or cluster data into groups or to predict likely future outcomes based upon a set of input variables/data. b) Data Mining Techniques There are several major data mining techniques have been developed and used in data mining projects. Association - Association is one of the best known data mining technique. In association, a pattern is discovered based on a relationship of a particular item on other items in the same transaction. For example, the association technique is used in market basket analysis to identify what products that customers frequently purchase together. Classification - Classification is a classic data mining technique based on machine learning. Basically classification is used to classify each item in a set of data into one of predefined set of classes or groups. Clustering - Clustering is a data mining technique that makes meaningful or useful cluster of objects that have similar characteristic using automatic technique. Different from classification, clustering technique also defines the classes and put objects in them, while in classification objects are assigned into predefined classes. Prediction - The prediction as it name implied is one of a data mining techniques that discovers relationship between independent variables and relationship between dependent and independent variables Sequential Patterns - Sequential patterns analysis in one of data mining technique that seeks to discover similar patterns in data transaction over a business period. The uncover patterns are used for further business analysis to recognize relationships among data. Artificial neural networks - These are non-linear, predictive models that learn through training. Although they are powerful predictive modeling techniques, some of the power comes at the expense of ease of use and deployment. Decision trees - They are tree-shaped structures that represent decision sets. These decisions generate rules, which then are used to classify data. Decision trees are the favored technique for building understandable models. The nearest-neighbor method - This method classifies dataset records based on similar data in a historical dataset.