SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
Functional Dependencies: Redundancy Analysis
           and Correcting Violations
                      Solmaz Kolahi
                   solmaz@cs.ubc.ca

             Postdoctoral Research Fellow
            Department of Computer Science
             University of British Columbia

       Joint work with Leonid Libkin and Laks Lakshmanan




                                                   Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 1/20
Motivation

Both relational and XML databases may store redundant data:

                                        title             director      actor         year
                                        The Departed      Scorsese      DiCaprio      2006
                                        The Departed      Scorsese      Nicholson     2006
Functional Dependency:                  Shrek the Third   Miller        Myers         2007
                                        Shrek the Third   Miller        Murphy        2007
title → year
                                        Shrek the Third   Hui           Myers         2007
                                        Shrek the Third   Hui           Murphy        2007




Functional Dependency:
@AreaCode → @City


                         AreaCode                  AreaCode                    AreaCode             AreaCode
                           416                        416                         416                  416

                                      City                      City                                City               City
                                    Toronto                   Toronto                             Toronto            Toronto




                                                                                    Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 2/20
Motivation

Normalization techniques try to remove redundancies:
    BCNF eliminates all redundancies.
      only key dependencies are allowed.
      cannot always be achieved without losing dependencies.

                          AB → C     C→B
        R(A, B, C)

    3NF eliminates some redundancies.
       allows redundancy on prime attributes.
       preserves dependencies.



    XNF eliminates all redundancies w.r.t. XML functional dependencies.
      only XML keys are allowed: if X → p.@l, then X → p.
      introduced by Arenas & Libkin in 2002.

                                                       Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 3/20
Motivation

Traditional normalization theory
    characterizes a database as redundant or non-redundant.
    does not measure redundancy.
    cannot provide guidelines to reduce redundancy.


The more redundant the data, the more prone to update anomalies.

Our goal is
    to show that there is a spectrum of redundancy using an
    information-theoretic tool.
    to choose database designs with low redundancy.
    to handle databases with dependency violations.



                                                      Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 4/20
Outline

Motivation.
Reducing redundancy in relational and XML data:
   Measure of redundancy.
   Redundancy analysis of normal forms and schemas.
Correcting functional dependency violations.
Conclusions.
Future work.




                                               Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 5/20
Measure of Information Content

Proposed by Arenas & Libkin in 2003.
Used to measure the redundancy of a data value in a database
instance with respect to a set of constraints.
Intuitively, RICI (p|Σ) measures the relative information content of
position p in instance I w.r.t. constraints Σ.
Independent of data models and query languages.




                                                    Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 6/20
Measure of Information Content

    Proposed by Arenas & Libkin in 2003.
    Used to measure the redundancy of a data value in a database
    instance with respect to a set of constraints.
    Intuitively, RICI (p|Σ) measures the relative information content of
    position p in instance I w.r.t. constraints Σ.
    Independent of data models and query languages.

Σ = {A → C}            A     B     C    D
 RICI (P |Σ)            1    2     3     4
   0.875                1    2     3     5




                                                        Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 6/20
Measure of Information Content

    Proposed by Arenas & Libkin in 2003.
    Used to measure the redundancy of a data value in a database
    instance with respect to a set of constraints.
    Intuitively, RICI (p|Σ) measures the relative information content of
    position p in instance I w.r.t. constraints Σ.
    Independent of data models and query languages.

Σ = {A → C}            A     B     C    D
 RICI (P |Σ)            1    2     3     4
   0.875                1    2     3     5
   0.781                1    2     3     6




                                                        Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 6/20
Measure of Information Content

    Proposed by Arenas & Libkin in 2003.
    Used to measure the redundancy of a data value in a database
    instance with respect to a set of constraints.
    Intuitively, RICI (p|Σ) measures the relative information content of
    position p in instance I w.r.t. constraints Σ.
    Independent of data models and query languages.

Σ = {A → C}            A     B     C    D
 RICI (P |Σ)            1    2     3     4
   0.875                1    2     3     5
   0.781                1    2     3     6
   0.711                1    2     3     7




                                                        Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 6/20
Measure of Information Content

    Proposed by Arenas & Libkin in 2003.
    Used to measure the redundancy of a data value in a database
    instance with respect to a set of constraints.
    Intuitively, RICI (p|Σ) measures the relative information content of
    position p in instance I w.r.t. constraints Σ.
    Independent of data models and query languages.

Σ = {A → C}            A     B     C    D
 RICI (P |Σ)            1    2     3     4
   0.875                1    2     3     5
   0.781                1    2     3     6
   0.711                1    2     3     7
   0.658                1    2     3     8



                                                        Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 6/20
Measure of Information Content

    Proposed by Arenas & Libkin in 2003.
    Used to measure the redundancy of a data value in a database
    instance with respect to a set of constraints.
    Intuitively, RICI (p|Σ) measures the relative information content of
    position p in instance I w.r.t. constraints Σ.
    Independent of data models and query languages.

Σ = {A → C}                                     Σ = {A → C, B → C}
                       A     B     C    D
 RICI (P |Σ)                                     RICI (P |Σ)
                        1    2     3     4
   0.875                                           0.781
                        1    2     3     5
   0.781                                           0.629
                        1    2     3     6
   0.711                                           0.522
                        1    2     3     7
   0.658                                           0.446
                        1    2     3     8



                                                        Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 6/20
Measure of Information Content


                                          A     B       C
                                          1      2        3
  R(A, B, C) Σ = {A → B}
                                          1      2        4
Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).
For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for
every a ∈ {1, . . . , k}.




                                                      Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
Measure of Information Content


                                          A     B       C
                                                 2        3
  R(A, B, C) Σ = {A → B}
                                          1      2
Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).
For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for
every a ∈ {1, . . . , k}.

P (2|X) =




                                                      Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
Measure of Information Content


                                          A     B       C
                                          1      2        3
  R(A, B, C) Σ = {A → B}
                                          1      2        1
Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).
For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for
every a ∈ {1, . . . , k}.

P (2|X) =




                                                      Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
Measure of Information Content


                                          A     B       C
                                          4      2        3
  R(A, B, C) Σ = {A → B}
                                          1      2        7
Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).
For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for
every a ∈ {1, . . . , k}.

P (2|X) =




                                                      Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
Measure of Information Content


                                          A     B       C
                                          4      2        3
  R(A, B, C) Σ = {A → B}
                                          1      2        7
Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).
For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for
every a ∈ {1, . . . , k}.

P (2|X) = 48/




                                                      Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
Measure of Information Content


                                          A     B         C
                                                    a       3
  R(A, B, C) Σ = {A → B}
                                          1         2
Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).
For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for
every a ∈ {1, . . . , k}.

P (2|X) = 48/(48 + 6 × 42) = 0.16
P (a|X) = 42/(48 + 6 × 42) = 0.14 for every a = 2




                                                        Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
Measure of Information Content


                                          A      B        C
                                             1      2       3
  R(A, B, C) Σ = {A → B}
                                             1      2       4
Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).
For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for
every a ∈ {1, . . . , k}.

P (2|X) = 48/(48 + 6 × 42) = 0.16
P (a|X) = 42/(48 + 6 × 42) = 0.14 for every a = 2

Conditional entropy : 2.8057
Average over all possible X: RICk = 2.4558
                                I




                                                        Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
Measure of Information Content


                                             A    B       C
                                             1      2       3
  R(A, B, C) Σ = {A → B}
                                             1      2       4
Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).
For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for
every a ∈ {1, . . . , k}.

P (2|X) = 48/(48 + 6 × 42) = 0.16
P (a|X) = 42/(48 + 6 × 42) = 0.14 for every a = 2

Conditional entropy : 2.8057
Average over all possible X: RICk = 2.4558
                                I


                                       RICk (p| Σ)
                                          I
                  RICI (p|Σ)   = lim               = 0.875
                                          log k
                                 k→∞



                                                        Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
Measure and Database Design

A schema S with constraints Σ is well-designed if for every instance I of
(S, Σ) and every position p in I RICI (p|Σ) = 1.

Known results (Arenas & Libkin, 2003):
    relational databases with FDs: (S, Σ) is well-designed iff it is in BCNF.
    XML documents with FDs: (S, Σ) is well-designed iff it is in XNF.




                                                       Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 8/20
Measure and Database Design

A schema S with constraints Σ is well-designed if for every instance I of
(S, Σ) and every position p in I RICI (p|Σ) = 1.

Known results (Arenas & Libkin, 2003):
    relational databases with FDs: (S, Σ) is well-designed iff it is in BCNF.
    XML documents with FDs: (S, Σ) is well-designed iff it is in XNF.

Well-designed databases cannot always be achieved:
    Performance issues.
    Dependency preservation.




                                                       Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 8/20
Measure and Database Design

A schema S with constraints Σ is well-designed if for every instance I of
(S, Σ) and every position p in I RICI (p|Σ) = 1.

Known results (Arenas & Libkin, 2003):
    relational databases with FDs: (S, Σ) is well-designed iff it is in BCNF.
    XML documents with FDs: (S, Σ) is well-designed iff it is in XNF.

Well-designed databases cannot always be achieved:
    Performance issues.
    Dependency preservation.

General design goal: maximizing information content to the possible
extent by enforcing some design conditions.



                                                       Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 8/20
Guaranteed Information Content

Given a condition C, guaranteed information content (GIC) is the smallest
information content found in instances of schemas satisfying C.

                                               instances of (S, Σ)

     Schema satisfying condition C
               (S, Σ)                =⇒
                                                                           ≥ GIC
                                                       RICI (p|Σ)

More formally,
    we look at the set of all possible values for information content

             POSS C (m) = {RICI (p | Σ) | I is an instance of (R, Σ),
                                          R has m attributes,
                                          (R, Σ) satisfies C},

    then GICC (m), is the infimum of POSS C (m).
                                                         Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 9/20
Price of Dependency Preservation

Design goal: minimizing redundancy while preserving FDs.

For a normal form NF, PRICE(NF ) is the minimum information content that
NF loses to guarantee dependency preservation.

    if c ∈ [0, 1] is the largest information content guaranteed for
    decompositions into NF,

                      NF-decomposition
                                    (R1 , Σ1 )


                                                                                   ≥c
                                                                RICI (p|Σ)
                     (R, Σ)          (R2 , Σ2 )


                                     (R3 , Σ3 )



    then price of dependency preservation, PRICE(NF ), is 1 − c.

                                                        Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 10/20
Price of Dependency Preservation

Theorem
                 = 1/2.
    PRICE(3NF)

                 ≥ 1/2 for any dependency-preserving normal form NF.
    PRICE(NF )


To pay the smallest price for achieving dependency preservation, we
should do a 3NF normalization.




                                                    Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 11/20
Price of Dependency Preservation

Theorem
                 = 1/2.
    PRICE(3NF)

                 ≥ 1/2 for any dependency-preserving normal form NF.
    PRICE(NF )


To pay the smallest price for achieving dependency preservation, we
should do a 3NF normalization.

Not all 3NF normalizations are equal:
    Special subclasses of 3NF exist (old research).
    Only one subclass (3NF+ ) achieves the smallest price.
    We compare normal forms based on guaranteed information content
    or highest redundancy they allow.




                                                      Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 11/20
Comparing Normal Forms

          For every m > 2:
Theorem


                                             21−m
                         GICAll (m)  =
                                             22−m
                         GIC3NF (m)  =
                         GIC3NF+ (m) =       1/2

    3NF is twice as good as doing nothing.
    3NF+ is exponentially better.
    similar results obtained if we compare normal forms based on
    guaranteed average information content.




                                                    Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 12/20
Redundancy of an Arbitrary Schema

    Normalizing into smaller relations is not always desirable.
       losing constraints.
       slowing down query answering.
    Normalization decision can be made based on
       how much redundancy the schema allows; or
       where in the spectrum of redundancy the schema lies; or
       the lowest information content found in all instances of the
       schema.

Design goal: decomposing the schema with the highest potential for
redundancy.




                                                      Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 13/20
Redundancy of an Arbitrary Schema

          Given an arbitrary schema R with FDs Σ, let
Theorem
    ΣA = {X | X → A, X is minimal and non-key};
    #HS = the number of hitting sets of ΣA ;
    l=|            X|.
            X∈ΣA

Then the smallest information content found in column A of instances is

                          GICR (A) = #HS · 2−l .
                             Σ




                                                        Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 14/20
Redundancy of an Arbitrary Schema

          Given an arbitrary schema R with FDs Σ, let
Theorem
    ΣA = {X | X → A, X is minimal and non-key};
    #HS = the number of hitting sets of ΣA ;
    l=|            X|.
            X∈ΣA

Then the smallest information content found in column A of instances is

                          GICR (A) = #HS · 2−l .
                             Σ



           R1 (A, B, C, D, E)        R2 (A, B, C, D, E)
           Σ1 = { AB → E,            Σ2 = { BC → E,
                   D → E}                    AC → E,
                                             BD → E}




                                                        Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 14/20
Redundancy of an Arbitrary Schema

          Given an arbitrary schema R with FDs Σ, let
Theorem
    ΣA = {X | X → A, X is minimal and non-key};
    #HS = the number of hitting sets of ΣA ;
    l=|            X|.
            X∈ΣA

Then the smallest information content found in column A of instances is

                             GICR (A) = #HS · 2−l .
                                Σ



           R1 (A, B, C, D, E)           R2 (A, B, C, D, E)
           Σ1 = { AB → E,               Σ2 = { BC → E,
                   D → E}                       AC → E,
                                                BD → E}
           GICR1 (E) =                  GICR2 (E) =
                         3                            1
                1                            2
              Σ                            Σ
                         8                            2



                                                          Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 14/20
Outline

Motivation.
Reducing redundancy in relational and XML data:
   Measure of redundancy.
   Redundancy analysis of normal forms and schemas.
Correcting functional dependency violations.
Conclusions.
Future work.




                                               Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 15/20
Functional Dependency Violations

Large databases often tend to violate a set of FDs.
Σ = {cnt, arCode → reg, cnt, reg → prov }
                       An inconsistent database
             name       cnt    prov reg arCode            phone
       t1    Smith     CAN      BC     Van      604      123 4567
       t2    Adams     CAN      BC     Van      604      765 4321
       t3 Simpson CAN           BC     Man      604      345 6789
       t4     Rice     CAN      AB      Vic     604      987 6543




                                                  Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 16/20
Functional Dependency Violations

Large databases often tend to violate a set of FDs.
Σ = {cnt, arCode → reg, cnt, reg → prov }
                       An inconsistent database
             name       cnt    prov reg arCode            phone
       t1    Smith     CAN      BC     Van      604      123 4567
       t2    Adams     CAN      BC     Van      604      765 4321
       t3 Simpson CAN           BC     Man      604      345 6789
       t4     Rice     CAN      AB      Vic     604      987 6543
                            A minimal repair
             name       cnt    prov reg arCode           phone
       t1    Smith     CAN     BC     Van    604        123 4567
       t2    Adams     CAN     BC     Van    604        765 4321
       t3   Simpson    CAN     BC     Van    604        345 6789
       t4               v1
              Rice              AB    Vic    604        987 6543


                                                  Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 16/20
Handling Inconsistent Databases

Integrity constraints Σ (FDs, keys, etc.).
Inconsistent database D: does not satisfy Σ.
We can produce a repair R by inserting/deleting tuples or modifying
values in D.
                     ∆(D, R) = number of modifications




                                                    Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 17/20
Handling Inconsistent Databases

Integrity constraints Σ (FDs, keys, etc.).
Inconsistent database D: does not satisfy Σ.
We can produce a repair R by inserting/deleting tuples or modifying
values in D.
                     ∆(D, R) = number of modifications
Handling inconsistency:
    Consistent query answering:
                                    {Q(R) | R is a minimal repair for D}
     certain answer for queryQ =
    Producing an optimum repair Ropt with minimum ∆.
Both approaches are intractable in general.




                                                    Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 17/20
Handling Inconsistent Databases

Integrity constraints Σ (FDs, keys, etc.).
Inconsistent database D: does not satisfy Σ.
We can produce a repair R by inserting/deleting tuples or modifying
values in D.
                     ∆(D, R) = number of modifications
Handling inconsistency:
    Consistent query answering:
                                    {Q(R) | R is a minimal repair for D}
     certain answer for queryQ =
    Producing an optimum repair Ropt with minimum ∆.
Both approaches are intractable in general.

Our approach: producing an approximate solution Rapp for optimum repair.
                     ∆(D, Rapp ) ≤ α · ∆(D, Ropt )



                                                    Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 17/20
Approximating Optimum Repair

Theorem. Finding an optimum solution for FD violations is NP-hard.

Theorem. Finding a constant-factor approximation for all FD violations is
NP-hard.

Theorem. For every fixed set of FDs, there is a polynomial-time algorithm
that approximates optimum repair within a factor of α, where α depends
on FDs.
                                              A    B     C         D          E

                                         t1   a1   b1    c1        d1         e1
                                         t2   a2   b1    c2        d2         e2
                                         t3   a1   b3    c3        d3         e3
                                         t4   a4   b4    c4        d4         e4
                                         t5   a5   b4    c5        d5         e5
                                         t6   a6   b6    c4        d5         e6
Σ = {A → C, B → C, CD → E}
                                                        Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 18/20
Conclusions

We analyze schemas and normal forms based on worst cases of
redundancy.
There is a spectrum of information content (redundancy) for
schemas.

           0                    1                           1
                                2
       poorly-designed                              well-designed

Producing optimum repair for FD violations is hard.
We introduced an approximation framework.




                                                 Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 19/20
Future Work

Comparing quality of schemas with low / high information content in
practice.
Defining normalization concepts for XML such as:
   dependency preserving decomposition.
Finding an equivalent of 3NF for XML as a normal form
   that guarantees an information content of 1 .
                                             2
   to which every XML document is decomposable.
Extending the repair algorithm for other integrity constraints.




                                                   Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 20/20

Mais conteúdo relacionado

Destaque

Fd & Normalization - Database Management System
Fd & Normalization - Database Management SystemFd & Normalization - Database Management System
Fd & Normalization - Database Management SystemDrishti Bhalla
 
Functional dependencies and normalization
Functional dependencies and normalizationFunctional dependencies and normalization
Functional dependencies and normalizationdaxesh chauhan
 
FUNCTION DEPENDENCY AND TYPES & EXAMPLE
FUNCTION DEPENDENCY  AND TYPES & EXAMPLEFUNCTION DEPENDENCY  AND TYPES & EXAMPLE
FUNCTION DEPENDENCY AND TYPES & EXAMPLEVraj Patel
 
Functional dependencies and normalization for relational databases
Functional dependencies and normalization for relational databasesFunctional dependencies and normalization for relational databases
Functional dependencies and normalization for relational databasesJafar Nesargi
 
Binary Search Tree in Data Structure
Binary Search Tree in Data StructureBinary Search Tree in Data Structure
Binary Search Tree in Data StructureDharita Chokshi
 
Normalization
NormalizationNormalization
Normalizationochesing
 
Database Normalization
Database NormalizationDatabase Normalization
Database NormalizationRathan Raj
 
7. Relational Database Design in DBMS
7. Relational Database Design in DBMS7. Relational Database Design in DBMS
7. Relational Database Design in DBMSkoolkampus
 
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NFDatabase Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NFOum Saokosal
 

Destaque (14)

Normlaization
NormlaizationNormlaization
Normlaization
 
Fd & Normalization - Database Management System
Fd & Normalization - Database Management SystemFd & Normalization - Database Management System
Fd & Normalization - Database Management System
 
Normalization
NormalizationNormalization
Normalization
 
Functional dependencies and normalization
Functional dependencies and normalizationFunctional dependencies and normalization
Functional dependencies and normalization
 
FUNCTION DEPENDENCY AND TYPES & EXAMPLE
FUNCTION DEPENDENCY  AND TYPES & EXAMPLEFUNCTION DEPENDENCY  AND TYPES & EXAMPLE
FUNCTION DEPENDENCY AND TYPES & EXAMPLE
 
Functional dependencies and normalization for relational databases
Functional dependencies and normalization for relational databasesFunctional dependencies and normalization for relational databases
Functional dependencies and normalization for relational databases
 
Binary Search Tree
Binary Search TreeBinary Search Tree
Binary Search Tree
 
Binary Search Tree in Data Structure
Binary Search Tree in Data StructureBinary Search Tree in Data Structure
Binary Search Tree in Data Structure
 
Normalization
NormalizationNormalization
Normalization
 
Trees, Binary Search Tree, AVL Tree in Data Structures
Trees, Binary Search Tree, AVL Tree in Data Structures Trees, Binary Search Tree, AVL Tree in Data Structures
Trees, Binary Search Tree, AVL Tree in Data Structures
 
Database Normalization
Database NormalizationDatabase Normalization
Database Normalization
 
7. Relational Database Design in DBMS
7. Relational Database Design in DBMS7. Relational Database Design in DBMS
7. Relational Database Design in DBMS
 
DBMS - Normalization
DBMS - NormalizationDBMS - Normalization
DBMS - Normalization
 
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NFDatabase Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
 

Semelhante a Somaz Kolahi : Functional Dependencies: Redundancy Analysis and Correcting Violations

SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"Boris Glavic
 
Presentation on Graph Clustering (vldb 09)
Presentation on Graph Clustering (vldb 09)Presentation on Graph Clustering (vldb 09)
Presentation on Graph Clustering (vldb 09)Waqas Nawaz
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Ontotext
 
Challenges for a new era
Challenges for a new eraChallenges for a new era
Challenges for a new eraDiane Hillmann
 
A Statistical and Schema Independent Approach to Identify Equivalent Properti...
A Statistical and Schema Independent Approach to Identify Equivalent Properti...A Statistical and Schema Independent Approach to Identify Equivalent Properti...
A Statistical and Schema Independent Approach to Identify Equivalent Properti...Kalpa Gunaratna
 
Sparql semantic information retrieval by
Sparql semantic information retrieval bySparql semantic information retrieval by
Sparql semantic information retrieval byIJNSA Journal
 
Top-N Recommendations from Implicit Feedback leveraging Linked Open Data
Top-N Recommendations from Implicit Feedback leveraging Linked Open DataTop-N Recommendations from Implicit Feedback leveraging Linked Open Data
Top-N Recommendations from Implicit Feedback leveraging Linked Open DataVito Ostuni
 
Ways to Extract Variable Insights when Data is Scarse
Ways to Extract Variable Insights when Data is ScarseWays to Extract Variable Insights when Data is Scarse
Ways to Extract Variable Insights when Data is ScarseZia Babar
 
Spatio textual similarity join
Spatio textual similarity joinSpatio textual similarity join
Spatio textual similarity joinIJDKP
 
10.1.1.70.8789
10.1.1.70.878910.1.1.70.8789
10.1.1.70.8789Hoài Bùi
 
Mapping of extensible markup language-to-ontology representation for effectiv...
Mapping of extensible markup language-to-ontology representation for effectiv...Mapping of extensible markup language-to-ontology representation for effectiv...
Mapping of extensible markup language-to-ontology representation for effectiv...IAESIJAI
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
 
Image-Based Literal Node Matching for Linked Data Integration
Image-Based Literal Node Matching for Linked Data IntegrationImage-Based Literal Node Matching for Linked Data Integration
Image-Based Literal Node Matching for Linked Data IntegrationIJwest
 
SPARQL: SEMANTIC INFORMATION RETRIEVAL BY EMBEDDING PREPOSITIONS
SPARQL: SEMANTIC INFORMATION RETRIEVAL BY EMBEDDING PREPOSITIONSSPARQL: SEMANTIC INFORMATION RETRIEVAL BY EMBEDDING PREPOSITIONS
SPARQL: SEMANTIC INFORMATION RETRIEVAL BY EMBEDDING PREPOSITIONSIJNSA Journal
 
Collaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringCollaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringWaqas Nawaz
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...semanticsconference
 

Semelhante a Somaz Kolahi : Functional Dependencies: Redundancy Analysis and Correcting Violations (20)

SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
 
Role of Semantic Web in Health Informatics
Role of Semantic Web in Health InformaticsRole of Semantic Web in Health Informatics
Role of Semantic Web in Health Informatics
 
Presentation on Graph Clustering (vldb 09)
Presentation on Graph Clustering (vldb 09)Presentation on Graph Clustering (vldb 09)
Presentation on Graph Clustering (vldb 09)
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
Challenges for a new era
Challenges for a new eraChallenges for a new era
Challenges for a new era
 
A Statistical and Schema Independent Approach to Identify Equivalent Properti...
A Statistical and Schema Independent Approach to Identify Equivalent Properti...A Statistical and Schema Independent Approach to Identify Equivalent Properti...
A Statistical and Schema Independent Approach to Identify Equivalent Properti...
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Sparql semantic information retrieval by
Sparql semantic information retrieval bySparql semantic information retrieval by
Sparql semantic information retrieval by
 
Top-N Recommendations from Implicit Feedback leveraging Linked Open Data
Top-N Recommendations from Implicit Feedback leveraging Linked Open DataTop-N Recommendations from Implicit Feedback leveraging Linked Open Data
Top-N Recommendations from Implicit Feedback leveraging Linked Open Data
 
Ways to Extract Variable Insights when Data is Scarse
Ways to Extract Variable Insights when Data is ScarseWays to Extract Variable Insights when Data is Scarse
Ways to Extract Variable Insights when Data is Scarse
 
Spatio textual similarity join
Spatio textual similarity joinSpatio textual similarity join
Spatio textual similarity join
 
10.1.1.70.8789
10.1.1.70.878910.1.1.70.8789
10.1.1.70.8789
 
Mapping of extensible markup language-to-ontology representation for effectiv...
Mapping of extensible markup language-to-ontology representation for effectiv...Mapping of extensible markup language-to-ontology representation for effectiv...
Mapping of extensible markup language-to-ontology representation for effectiv...
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
Relational Databases_RDF Graphs_and_Constraints.pdf
Relational Databases_RDF Graphs_and_Constraints.pdfRelational Databases_RDF Graphs_and_Constraints.pdf
Relational Databases_RDF Graphs_and_Constraints.pdf
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 
Image-Based Literal Node Matching for Linked Data Integration
Image-Based Literal Node Matching for Linked Data IntegrationImage-Based Literal Node Matching for Linked Data Integration
Image-Based Literal Node Matching for Linked Data Integration
 
SPARQL: SEMANTIC INFORMATION RETRIEVAL BY EMBEDDING PREPOSITIONS
SPARQL: SEMANTIC INFORMATION RETRIEVAL BY EMBEDDING PREPOSITIONSSPARQL: SEMANTIC INFORMATION RETRIEVAL BY EMBEDDING PREPOSITIONS
SPARQL: SEMANTIC INFORMATION RETRIEVAL BY EMBEDDING PREPOSITIONS
 
Collaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringCollaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph Clustering
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 

Mais de knowdiff

Ut talk feb 2017
Ut talk   feb 2017Ut talk   feb 2017
Ut talk feb 2017knowdiff
 
Ali khalili: Towards an Open Linked Data-based Infrastructure for Studying Sc...
Ali khalili: Towards an Open Linked Data-based Infrastructure for Studying Sc...Ali khalili: Towards an Open Linked Data-based Infrastructure for Studying Sc...
Ali khalili: Towards an Open Linked Data-based Infrastructure for Studying Sc...knowdiff
 
Scheduling for cloud systems with multi level data locality
Scheduling for cloud systems with multi level data localityScheduling for cloud systems with multi level data locality
Scheduling for cloud systems with multi level data localityknowdiff
 
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...knowdiff
 
Knowledge based economy and power of crowd sourcing
Knowledge based economy and power of crowd sourcing Knowledge based economy and power of crowd sourcing
Knowledge based economy and power of crowd sourcing knowdiff
 
Amin tayyebi: Big Data and Land Use Change Science
Amin tayyebi: Big Data and Land Use Change ScienceAmin tayyebi: Big Data and Land Use Change Science
Amin tayyebi: Big Data and Land Use Change Scienceknowdiff
 
Mehdi Rezagholizadeh: Image Sensor Modeling: Color Measurement at Low Light L...
Mehdi Rezagholizadeh: Image Sensor Modeling: Color Measurement at Low Light L...Mehdi Rezagholizadeh: Image Sensor Modeling: Color Measurement at Low Light L...
Mehdi Rezagholizadeh: Image Sensor Modeling: Color Measurement at Low Light L...knowdiff
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systemsknowdiff
 
Seyed Mehdi mohaghegh: Modelling material use within the low carbon energy pa...
Seyed Mehdi mohaghegh: Modelling material use within the low carbon energy pa...Seyed Mehdi mohaghegh: Modelling material use within the low carbon energy pa...
Seyed Mehdi mohaghegh: Modelling material use within the low carbon energy pa...knowdiff
 
Narjess Afzaly: Model Your Problem with Graphs and Generate your objects
Narjess Afzaly: Model Your Problem with Graphs and Generate your objectsNarjess Afzaly: Model Your Problem with Graphs and Generate your objects
Narjess Afzaly: Model Your Problem with Graphs and Generate your objectsknowdiff
 
Computational methods applications in air pollution modeling (Dr. Yadghar)
Computational methods applications in air pollution modeling (Dr. Yadghar)Computational methods applications in air pollution modeling (Dr. Yadghar)
Computational methods applications in air pollution modeling (Dr. Yadghar)knowdiff
 
Uncalibrated Image-Based Robotic Visual Servoing (knowdiff.net)
Uncalibrated Image-Based Robotic Visual Servoing (knowdiff.net)Uncalibrated Image-Based Robotic Visual Servoing (knowdiff.net)
Uncalibrated Image-Based Robotic Visual Servoing (knowdiff.net)knowdiff
 
Knowdiff visiting lecturer 140 (Azad Shademan): Uncalibrated Image-Based Robo...
Knowdiff visiting lecturer 140 (Azad Shademan): Uncalibrated Image-Based Robo...Knowdiff visiting lecturer 140 (Azad Shademan): Uncalibrated Image-Based Robo...
Knowdiff visiting lecturer 140 (Azad Shademan): Uncalibrated Image-Based Robo...knowdiff
 
Mehran Shaghaghi: Quantum Mechanics Dilemmas
Mehran Shaghaghi: Quantum Mechanics DilemmasMehran Shaghaghi: Quantum Mechanics Dilemmas
Mehran Shaghaghi: Quantum Mechanics Dilemmasknowdiff
 
Hossein Taghavi : Codes on Graphs
Hossein Taghavi : Codes on GraphsHossein Taghavi : Codes on Graphs
Hossein Taghavi : Codes on Graphsknowdiff
 
Dr. Amir Nejat
Dr. Amir NejatDr. Amir Nejat
Dr. Amir Nejatknowdiff
 

Mais de knowdiff (17)

Ut talk feb 2017
Ut talk   feb 2017Ut talk   feb 2017
Ut talk feb 2017
 
Ali khalili: Towards an Open Linked Data-based Infrastructure for Studying Sc...
Ali khalili: Towards an Open Linked Data-based Infrastructure for Studying Sc...Ali khalili: Towards an Open Linked Data-based Infrastructure for Studying Sc...
Ali khalili: Towards an Open Linked Data-based Infrastructure for Studying Sc...
 
Scheduling for cloud systems with multi level data locality
Scheduling for cloud systems with multi level data localityScheduling for cloud systems with multi level data locality
Scheduling for cloud systems with multi level data locality
 
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...
 
Knowledge based economy and power of crowd sourcing
Knowledge based economy and power of crowd sourcing Knowledge based economy and power of crowd sourcing
Knowledge based economy and power of crowd sourcing
 
Amin tayyebi: Big Data and Land Use Change Science
Amin tayyebi: Big Data and Land Use Change ScienceAmin tayyebi: Big Data and Land Use Change Science
Amin tayyebi: Big Data and Land Use Change Science
 
Mehdi Rezagholizadeh: Image Sensor Modeling: Color Measurement at Low Light L...
Mehdi Rezagholizadeh: Image Sensor Modeling: Color Measurement at Low Light L...Mehdi Rezagholizadeh: Image Sensor Modeling: Color Measurement at Low Light L...
Mehdi Rezagholizadeh: Image Sensor Modeling: Color Measurement at Low Light L...
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
 
Seyed Mehdi mohaghegh: Modelling material use within the low carbon energy pa...
Seyed Mehdi mohaghegh: Modelling material use within the low carbon energy pa...Seyed Mehdi mohaghegh: Modelling material use within the low carbon energy pa...
Seyed Mehdi mohaghegh: Modelling material use within the low carbon energy pa...
 
Narjess Afzaly: Model Your Problem with Graphs and Generate your objects
Narjess Afzaly: Model Your Problem with Graphs and Generate your objectsNarjess Afzaly: Model Your Problem with Graphs and Generate your objects
Narjess Afzaly: Model Your Problem with Graphs and Generate your objects
 
Computational methods applications in air pollution modeling (Dr. Yadghar)
Computational methods applications in air pollution modeling (Dr. Yadghar)Computational methods applications in air pollution modeling (Dr. Yadghar)
Computational methods applications in air pollution modeling (Dr. Yadghar)
 
Uncalibrated Image-Based Robotic Visual Servoing (knowdiff.net)
Uncalibrated Image-Based Robotic Visual Servoing (knowdiff.net)Uncalibrated Image-Based Robotic Visual Servoing (knowdiff.net)
Uncalibrated Image-Based Robotic Visual Servoing (knowdiff.net)
 
Knowdiff visiting lecturer 140 (Azad Shademan): Uncalibrated Image-Based Robo...
Knowdiff visiting lecturer 140 (Azad Shademan): Uncalibrated Image-Based Robo...Knowdiff visiting lecturer 140 (Azad Shademan): Uncalibrated Image-Based Robo...
Knowdiff visiting lecturer 140 (Azad Shademan): Uncalibrated Image-Based Robo...
 
Mehran Shaghaghi: Quantum Mechanics Dilemmas
Mehran Shaghaghi: Quantum Mechanics DilemmasMehran Shaghaghi: Quantum Mechanics Dilemmas
Mehran Shaghaghi: Quantum Mechanics Dilemmas
 
Hossein Taghavi : Codes on Graphs
Hossein Taghavi : Codes on GraphsHossein Taghavi : Codes on Graphs
Hossein Taghavi : Codes on Graphs
 
Dr. Amir Nejat
Dr. Amir NejatDr. Amir Nejat
Dr. Amir Nejat
 
Alborz
AlborzAlborz
Alborz
 

Último

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Último (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Somaz Kolahi : Functional Dependencies: Redundancy Analysis and Correcting Violations

  • 1. Functional Dependencies: Redundancy Analysis and Correcting Violations Solmaz Kolahi solmaz@cs.ubc.ca Postdoctoral Research Fellow Department of Computer Science University of British Columbia Joint work with Leonid Libkin and Laks Lakshmanan Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 1/20
  • 2. Motivation Both relational and XML databases may store redundant data: title director actor year The Departed Scorsese DiCaprio 2006 The Departed Scorsese Nicholson 2006 Functional Dependency: Shrek the Third Miller Myers 2007 Shrek the Third Miller Murphy 2007 title → year Shrek the Third Hui Myers 2007 Shrek the Third Hui Murphy 2007 Functional Dependency: @AreaCode → @City AreaCode AreaCode AreaCode AreaCode 416 416 416 416 City City City City Toronto Toronto Toronto Toronto Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 2/20
  • 3. Motivation Normalization techniques try to remove redundancies: BCNF eliminates all redundancies. only key dependencies are allowed. cannot always be achieved without losing dependencies. AB → C C→B R(A, B, C) 3NF eliminates some redundancies. allows redundancy on prime attributes. preserves dependencies. XNF eliminates all redundancies w.r.t. XML functional dependencies. only XML keys are allowed: if X → p.@l, then X → p. introduced by Arenas & Libkin in 2002. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 3/20
  • 4. Motivation Traditional normalization theory characterizes a database as redundant or non-redundant. does not measure redundancy. cannot provide guidelines to reduce redundancy. The more redundant the data, the more prone to update anomalies. Our goal is to show that there is a spectrum of redundancy using an information-theoretic tool. to choose database designs with low redundancy. to handle databases with dependency violations. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 4/20
  • 5. Outline Motivation. Reducing redundancy in relational and XML data: Measure of redundancy. Redundancy analysis of normal forms and schemas. Correcting functional dependency violations. Conclusions. Future work. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 5/20
  • 6. Measure of Information Content Proposed by Arenas & Libkin in 2003. Used to measure the redundancy of a data value in a database instance with respect to a set of constraints. Intuitively, RICI (p|Σ) measures the relative information content of position p in instance I w.r.t. constraints Σ. Independent of data models and query languages. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 6/20
  • 7. Measure of Information Content Proposed by Arenas & Libkin in 2003. Used to measure the redundancy of a data value in a database instance with respect to a set of constraints. Intuitively, RICI (p|Σ) measures the relative information content of position p in instance I w.r.t. constraints Σ. Independent of data models and query languages. Σ = {A → C} A B C D RICI (P |Σ) 1 2 3 4 0.875 1 2 3 5 Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 6/20
  • 8. Measure of Information Content Proposed by Arenas & Libkin in 2003. Used to measure the redundancy of a data value in a database instance with respect to a set of constraints. Intuitively, RICI (p|Σ) measures the relative information content of position p in instance I w.r.t. constraints Σ. Independent of data models and query languages. Σ = {A → C} A B C D RICI (P |Σ) 1 2 3 4 0.875 1 2 3 5 0.781 1 2 3 6 Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 6/20
  • 9. Measure of Information Content Proposed by Arenas & Libkin in 2003. Used to measure the redundancy of a data value in a database instance with respect to a set of constraints. Intuitively, RICI (p|Σ) measures the relative information content of position p in instance I w.r.t. constraints Σ. Independent of data models and query languages. Σ = {A → C} A B C D RICI (P |Σ) 1 2 3 4 0.875 1 2 3 5 0.781 1 2 3 6 0.711 1 2 3 7 Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 6/20
  • 10. Measure of Information Content Proposed by Arenas & Libkin in 2003. Used to measure the redundancy of a data value in a database instance with respect to a set of constraints. Intuitively, RICI (p|Σ) measures the relative information content of position p in instance I w.r.t. constraints Σ. Independent of data models and query languages. Σ = {A → C} A B C D RICI (P |Σ) 1 2 3 4 0.875 1 2 3 5 0.781 1 2 3 6 0.711 1 2 3 7 0.658 1 2 3 8 Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 6/20
  • 11. Measure of Information Content Proposed by Arenas & Libkin in 2003. Used to measure the redundancy of a data value in a database instance with respect to a set of constraints. Intuitively, RICI (p|Σ) measures the relative information content of position p in instance I w.r.t. constraints Σ. Independent of data models and query languages. Σ = {A → C} Σ = {A → C, B → C} A B C D RICI (P |Σ) RICI (P |Σ) 1 2 3 4 0.875 0.781 1 2 3 5 0.781 0.629 1 2 3 6 0.711 0.522 1 2 3 7 0.658 0.446 1 2 3 8 Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 6/20
  • 12. Measure of Information Content A B C 1 2 3 R(A, B, C) Σ = {A → B} 1 2 4 Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7). For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for every a ∈ {1, . . . , k}. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
  • 13. Measure of Information Content A B C 2 3 R(A, B, C) Σ = {A → B} 1 2 Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7). For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for every a ∈ {1, . . . , k}. P (2|X) = Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
  • 14. Measure of Information Content A B C 1 2 3 R(A, B, C) Σ = {A → B} 1 2 1 Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7). For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for every a ∈ {1, . . . , k}. P (2|X) = Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
  • 15. Measure of Information Content A B C 4 2 3 R(A, B, C) Σ = {A → B} 1 2 7 Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7). For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for every a ∈ {1, . . . , k}. P (2|X) = Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
  • 16. Measure of Information Content A B C 4 2 3 R(A, B, C) Σ = {A → B} 1 2 7 Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7). For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for every a ∈ {1, . . . , k}. P (2|X) = 48/ Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
  • 17. Measure of Information Content A B C a 3 R(A, B, C) Σ = {A → B} 1 2 Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7). For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for every a ∈ {1, . . . , k}. P (2|X) = 48/(48 + 6 × 42) = 0.16 P (a|X) = 42/(48 + 6 × 42) = 0.14 for every a = 2 Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
  • 18. Measure of Information Content A B C 1 2 3 R(A, B, C) Σ = {A → B} 1 2 4 Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7). For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for every a ∈ {1, . . . , k}. P (2|X) = 48/(48 + 6 × 42) = 0.16 P (a|X) = 42/(48 + 6 × 42) = 0.14 for every a = 2 Conditional entropy : 2.8057 Average over all possible X: RICk = 2.4558 I Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
  • 19. Measure of Information Content A B C 1 2 3 R(A, B, C) Σ = {A → B} 1 2 4 Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7). For every X ⊆ Pos(I) − {p} compute probability distribution P (a|X) for every a ∈ {1, . . . , k}. P (2|X) = 48/(48 + 6 × 42) = 0.16 P (a|X) = 42/(48 + 6 × 42) = 0.14 for every a = 2 Conditional entropy : 2.8057 Average over all possible X: RICk = 2.4558 I RICk (p| Σ) I RICI (p|Σ) = lim = 0.875 log k k→∞ Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 7/20
  • 20. Measure and Database Design A schema S with constraints Σ is well-designed if for every instance I of (S, Σ) and every position p in I RICI (p|Σ) = 1. Known results (Arenas & Libkin, 2003): relational databases with FDs: (S, Σ) is well-designed iff it is in BCNF. XML documents with FDs: (S, Σ) is well-designed iff it is in XNF. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 8/20
  • 21. Measure and Database Design A schema S with constraints Σ is well-designed if for every instance I of (S, Σ) and every position p in I RICI (p|Σ) = 1. Known results (Arenas & Libkin, 2003): relational databases with FDs: (S, Σ) is well-designed iff it is in BCNF. XML documents with FDs: (S, Σ) is well-designed iff it is in XNF. Well-designed databases cannot always be achieved: Performance issues. Dependency preservation. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 8/20
  • 22. Measure and Database Design A schema S with constraints Σ is well-designed if for every instance I of (S, Σ) and every position p in I RICI (p|Σ) = 1. Known results (Arenas & Libkin, 2003): relational databases with FDs: (S, Σ) is well-designed iff it is in BCNF. XML documents with FDs: (S, Σ) is well-designed iff it is in XNF. Well-designed databases cannot always be achieved: Performance issues. Dependency preservation. General design goal: maximizing information content to the possible extent by enforcing some design conditions. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 8/20
  • 23. Guaranteed Information Content Given a condition C, guaranteed information content (GIC) is the smallest information content found in instances of schemas satisfying C. instances of (S, Σ) Schema satisfying condition C (S, Σ) =⇒ ≥ GIC RICI (p|Σ) More formally, we look at the set of all possible values for information content POSS C (m) = {RICI (p | Σ) | I is an instance of (R, Σ), R has m attributes, (R, Σ) satisfies C}, then GICC (m), is the infimum of POSS C (m). Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 9/20
  • 24. Price of Dependency Preservation Design goal: minimizing redundancy while preserving FDs. For a normal form NF, PRICE(NF ) is the minimum information content that NF loses to guarantee dependency preservation. if c ∈ [0, 1] is the largest information content guaranteed for decompositions into NF, NF-decomposition (R1 , Σ1 ) ≥c RICI (p|Σ) (R, Σ) (R2 , Σ2 ) (R3 , Σ3 ) then price of dependency preservation, PRICE(NF ), is 1 − c. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 10/20
  • 25. Price of Dependency Preservation Theorem = 1/2. PRICE(3NF) ≥ 1/2 for any dependency-preserving normal form NF. PRICE(NF ) To pay the smallest price for achieving dependency preservation, we should do a 3NF normalization. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 11/20
  • 26. Price of Dependency Preservation Theorem = 1/2. PRICE(3NF) ≥ 1/2 for any dependency-preserving normal form NF. PRICE(NF ) To pay the smallest price for achieving dependency preservation, we should do a 3NF normalization. Not all 3NF normalizations are equal: Special subclasses of 3NF exist (old research). Only one subclass (3NF+ ) achieves the smallest price. We compare normal forms based on guaranteed information content or highest redundancy they allow. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 11/20
  • 27. Comparing Normal Forms For every m > 2: Theorem 21−m GICAll (m) = 22−m GIC3NF (m) = GIC3NF+ (m) = 1/2 3NF is twice as good as doing nothing. 3NF+ is exponentially better. similar results obtained if we compare normal forms based on guaranteed average information content. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 12/20
  • 28. Redundancy of an Arbitrary Schema Normalizing into smaller relations is not always desirable. losing constraints. slowing down query answering. Normalization decision can be made based on how much redundancy the schema allows; or where in the spectrum of redundancy the schema lies; or the lowest information content found in all instances of the schema. Design goal: decomposing the schema with the highest potential for redundancy. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 13/20
  • 29. Redundancy of an Arbitrary Schema Given an arbitrary schema R with FDs Σ, let Theorem ΣA = {X | X → A, X is minimal and non-key}; #HS = the number of hitting sets of ΣA ; l=| X|. X∈ΣA Then the smallest information content found in column A of instances is GICR (A) = #HS · 2−l . Σ Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 14/20
  • 30. Redundancy of an Arbitrary Schema Given an arbitrary schema R with FDs Σ, let Theorem ΣA = {X | X → A, X is minimal and non-key}; #HS = the number of hitting sets of ΣA ; l=| X|. X∈ΣA Then the smallest information content found in column A of instances is GICR (A) = #HS · 2−l . Σ R1 (A, B, C, D, E) R2 (A, B, C, D, E) Σ1 = { AB → E, Σ2 = { BC → E, D → E} AC → E, BD → E} Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 14/20
  • 31. Redundancy of an Arbitrary Schema Given an arbitrary schema R with FDs Σ, let Theorem ΣA = {X | X → A, X is minimal and non-key}; #HS = the number of hitting sets of ΣA ; l=| X|. X∈ΣA Then the smallest information content found in column A of instances is GICR (A) = #HS · 2−l . Σ R1 (A, B, C, D, E) R2 (A, B, C, D, E) Σ1 = { AB → E, Σ2 = { BC → E, D → E} AC → E, BD → E} GICR1 (E) = GICR2 (E) = 3 1 1 2 Σ Σ 8 2 Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 14/20
  • 32. Outline Motivation. Reducing redundancy in relational and XML data: Measure of redundancy. Redundancy analysis of normal forms and schemas. Correcting functional dependency violations. Conclusions. Future work. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 15/20
  • 33. Functional Dependency Violations Large databases often tend to violate a set of FDs. Σ = {cnt, arCode → reg, cnt, reg → prov } An inconsistent database name cnt prov reg arCode phone t1 Smith CAN BC Van 604 123 4567 t2 Adams CAN BC Van 604 765 4321 t3 Simpson CAN BC Man 604 345 6789 t4 Rice CAN AB Vic 604 987 6543 Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 16/20
  • 34. Functional Dependency Violations Large databases often tend to violate a set of FDs. Σ = {cnt, arCode → reg, cnt, reg → prov } An inconsistent database name cnt prov reg arCode phone t1 Smith CAN BC Van 604 123 4567 t2 Adams CAN BC Van 604 765 4321 t3 Simpson CAN BC Man 604 345 6789 t4 Rice CAN AB Vic 604 987 6543 A minimal repair name cnt prov reg arCode phone t1 Smith CAN BC Van 604 123 4567 t2 Adams CAN BC Van 604 765 4321 t3 Simpson CAN BC Van 604 345 6789 t4 v1 Rice AB Vic 604 987 6543 Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 16/20
  • 35. Handling Inconsistent Databases Integrity constraints Σ (FDs, keys, etc.). Inconsistent database D: does not satisfy Σ. We can produce a repair R by inserting/deleting tuples or modifying values in D. ∆(D, R) = number of modifications Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 17/20
  • 36. Handling Inconsistent Databases Integrity constraints Σ (FDs, keys, etc.). Inconsistent database D: does not satisfy Σ. We can produce a repair R by inserting/deleting tuples or modifying values in D. ∆(D, R) = number of modifications Handling inconsistency: Consistent query answering: {Q(R) | R is a minimal repair for D} certain answer for queryQ = Producing an optimum repair Ropt with minimum ∆. Both approaches are intractable in general. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 17/20
  • 37. Handling Inconsistent Databases Integrity constraints Σ (FDs, keys, etc.). Inconsistent database D: does not satisfy Σ. We can produce a repair R by inserting/deleting tuples or modifying values in D. ∆(D, R) = number of modifications Handling inconsistency: Consistent query answering: {Q(R) | R is a minimal repair for D} certain answer for queryQ = Producing an optimum repair Ropt with minimum ∆. Both approaches are intractable in general. Our approach: producing an approximate solution Rapp for optimum repair. ∆(D, Rapp ) ≤ α · ∆(D, Ropt ) Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 17/20
  • 38. Approximating Optimum Repair Theorem. Finding an optimum solution for FD violations is NP-hard. Theorem. Finding a constant-factor approximation for all FD violations is NP-hard. Theorem. For every fixed set of FDs, there is a polynomial-time algorithm that approximates optimum repair within a factor of α, where α depends on FDs. A B C D E t1 a1 b1 c1 d1 e1 t2 a2 b1 c2 d2 e2 t3 a1 b3 c3 d3 e3 t4 a4 b4 c4 d4 e4 t5 a5 b4 c5 d5 e5 t6 a6 b6 c4 d5 e6 Σ = {A → C, B → C, CD → E} Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 18/20
  • 39. Conclusions We analyze schemas and normal forms based on worst cases of redundancy. There is a spectrum of information content (redundancy) for schemas. 0 1 1 2 poorly-designed well-designed Producing optimum repair for FD violations is hard. We introduced an approximation framework. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 19/20
  • 40. Future Work Comparing quality of schemas with low / high information content in practice. Defining normalization concepts for XML such as: dependency preserving decomposition. Finding an equivalent of 3NF for XML as a normal form that guarantees an information content of 1 . 2 to which every XML document is decomposable. Extending the repair algorithm for other integrity constraints. Solmaz Kolahi, Sharif U. of Tech., December 2008 – p. 20/20