SlideShare uma empresa Scribd logo
1 de 65
Baixar para ler offline
Value Invention in Data Exchange
Patricia Arocena1 Boris Glavic2 Ren´ee J. Miller1
University of Toronto1
DBGroup
Illinois Institute of Technology2
DBGroup
SIGMOD 2013 - June 25, 2013 - New York, USA
Outline
1 Introduction
2 Linearization
3 Exploiting Source Constraints
4 Experiments
5 Conclusions
The Data Exchange Problem1
Schema Mappings M = (S, T, Σ)
• Source Schema S and Target Schema T
• High-level specification Σ
• models the relationship between S and T
Source Schema S Target Schema T
Source Data Target Data
M
Source Schema S Target Schema T
Source Data Target Data
MSource Schema S Target Schema T
Source Data Target Data
M
1R. Fagin et al., Theor. Comput. Sci. 336 (2005).
Slide 1 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
The Data Exchange Problem1
Schema Mappings M = (S, T, Σ)
• Source Schema S and Target Schema T
• High-level specification Σ
• models the relationship between S and T
Data Exchange
• Given an instance of S
Source Schema S Target Schema T
Source Data Target Data
M
Source Schema S Target Schema T
Source Data Target Data
M
Source Schema S Target Schema T
Source Data Target Data
MSource Schema S Target Schema T
Source Data Target Data
M
1R. Fagin et al., Theor. Comput. Sci. 336 (2005).
Slide 1 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
The Data Exchange Problem1
Schema Mappings M = (S, T, Σ)
• Source Schema S and Target Schema T
• High-level specification Σ
• models the relationship between S and T
Data Exchange
• Given an instance of S
• How to materialize a target instance of T?
Source Schema S Target Schema T
Source Data Target Data
M
Source Schema S Target Schema T
Source Data Target Data
M
Source Schema S Target Schema T
Source Data Target Data
MSource Schema S Target Schema T
Source Data Target Data
M
Source Schema S Target Schema T
Source Data Target Data
M
1R. Fagin et al., Theor. Comput. Sci. 336 (2005).
Slide 1 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
Example
Source Schema S Target Schema T
Source Data Target Data
MWorksOn(Department,Project,City)
Source Schema S Target Schema T
Source Data Target Data
M Projects(PId, City, ManagerId)Source Schema S Target Schema T
Source Data Target Data
MSource Schema S Target Schema T
Source Data Target Data
M
Source Schema S Target Schema T
Source Data Target Data
M
IT Web Toronto
IT Big Data Chicago
Sales Mobile New York
NULL Toronto NULL
NULL Chicago NULL
NULL New York NULL
We usually create values to represent incomplete information!
Slide 2 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
Value Invention
Source Schema S Target Schema T
Source Data Target Data
MWorksOn(Department,Project,City)
Source Schema S Target Schema T
Source Data Target Data
M Projects(PId, City, ManagerId)Source Schema S Target Schema T
Source Data Target Data
MSource Schema S Target Schema T
Source Data Target Data
M
Source Schema S Target Schema T
Source Data Target Data
M
IT Web Toronto
IT Big Data Chicago
Sales Mobile New York
f(Web) Toronto g(IT)
f(Big Data) Chicago g(IT)
f(Mobile) New York g(Sales)
We usually create values to represent incomplete information!
Slide 2 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
Value Invention
Source Schema S Target Schema T
Source Data Target Data
MWorksOn(Department,Project,City)
Source Schema S Target Schema T
Source Data Target Data
M Projects(PId, City, ManagerId)Source Schema S Target Schema T
Source Data Target Data
MSource Schema S Target Schema T
Source Data Target Data
M
Source Schema S Target Schema T
Source Data Target Data
M
IT Web Toronto
IT Big Data Chicago
Sales Mobile New York
f(Web) Toronto g(IT)
f(Big Data) Chicago g(IT)
f(Mobile) New York g(Sales)
We usually create values to represent incomplete information!
∃f ∃g ( WorksOn (d, p, c) → Project (f (p), c, g(d)) )
Slide 2 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
Our Goal
• Understand when schema mappings specified by SO tgds
• Flexible and precise value invention
• . . . can be rewritten into nested GLAV mappings
• Desirable computational and programatic properties
Slide 3 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
Skolem Functions
• Introduced by Thoralf A. Skolem (1920s)
• Widely used in Mathematical Logic and Computer Science
Many important uses in Information Integration
• to model object identifier (OID) inventiona
aR. Hull, M. Yoshikawa, In VLDB (1990).
Slide 4 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
Skolem Functions
• Introduced by Thoralf A. Skolem (1920s)
• Widely used in Mathematical Logic and Computer Science
Many important uses in Information Integration
• to model object identifier (OID) invention
• to express correlation semantics (e.g., grouping and data merging)abcd
aL. Popa et al., In VLDB (2002).
bA. Fuxman et al., In VLDB (2006).
cL. Libkin, C. Sirangelo, J. Comput. Syst. Sci. 77 (2011).
dB. Alexe et al., VLDB J. 21 (2012).
Slide 4 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
Skolem Functions
• Introduced by Thoralf A. Skolem (1920s)
• Widely used in Mathematical Logic and Computer Science
Many important uses in Information Integration
• to model object identifier (OID) invention
• to express correlation semantics (e.g., grouping and data merging)
• to provide a precise representation of
missing and incomplete informationabc
aY. Papakonstantinou et al., In VLDB (1996).
bL. Popa et al., In VLDB (2002).
cR. Fagin et al., TODS 30 (2005).
Slide 4 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
Schema Mapping Languages
Various logical mapping formalisms
• s-t tgds (also known as GLAV)a
• Nested s-t tgds (nested GLAV)b
• Second-Order (SO) tgdsc
aR. Fagin et al., Theor. Comput. Sci. 336 (2005).
bA. Fuxman et al., In VLDB (2006).
cR. Fagin et al., TODS 30 (2005).
Slide 5 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
Schema Mapping Languages
Various logical mapping formalisms
• s-t tgds (also known as GLAV)
• Nested s-t tgds (nested GLAV)
• Second-Order (SO) tgds
Expressiveness
• SO tgds permits arbitrary Skolems!a
• FO mapping languages have more desirable programmatic and
computational propertiesb
aR. Fagin et al., TODS 30 (2005).
bB. ten Cate, P. Kolaitis, In ICDT (2009).
Slide 5 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
Characterization of Mapping Languages234
Property GLAV nested GLAV SO tgds
Composition Not closed Not closed Closed
Value Invention No Linear Fully customized
correlation correlation correlation
Target
Homomorphisms Closed Closed Not closed
Model Checking PTIME PTIME NP-Complete
2R. Fagin et al., Theor. Comput. Sci. 336 (2005).
3R. Fagin et al., TODS 30 (2005).
4B. ten Cate, P. Kolaitis, In ICDT (2009).
Slide 6 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
The Quest for FO Rewritability
Rewritability
• Many SO tgds are equivalent to FO mappings!
• We call this FO/GLAV/nested GLAV rewritable
• Some SO tgds are not FO rewritablea
• . . . Even testing for FO rewritability is undecidableb
aR. Fagin et al., TODS 30 (2005).
bI. Feinerer et al., In AMW (2011).
Nash, Bernstein and Melnik
• First sufficient condition for GLAV rewritabilitya
• Tailored to consider SO tgds produced by mapping composition
aA. Nash et al., TODS 32 (2007).
Slide 7 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
Our Contributions
1 Sufficient condition for nested GLAV rewritability of SO tgds
2 Linearize:
• PTIME algorithm for rewriting SO tgds
3 Equivalence preserving transformation of SO tgds using source
semantics
4 LinearizeFDs:
• PTIME algorithm for rewriting SO tgds using source FDs
5 Extensive experimental evaluation
• STBenchmark 2.0a
• Real-life mapping scenarios
aP. C. Arocena et al., “STBenchmark 2.0”, tech. rep. (Uni. of Toronto, 2013).
Slide 8 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
Outline
1 Introduction
2 Linearization
3 Exploiting Source Constraints
4 Experiments
5 Conclusions
Intuition of Rewriting
Rewrite SO tgds into nested GLAV
• Replace second-order existentials with first-order existentials
• ∃f (x) → ∃vf
• Apply logical equivalence of Skolemization in reverse direction
• May have to reorder universal quantifiers to create ∀x
Skolemization Equivalence
∀x∃vf δ(x, vf ) ≡ ∃f ∀x δ(x, vf )[vf ← f (x)]
Slide 9 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
UnSkolemization Revisited
Example: Key Invention
Source Schema
WorksOn (Department, Project, BudgetId)
Audit (BudgetId, Auditor)
City (Department, City)
Target Schema
Project (PId, BudgetId)
Dept (Dept, Year, Project, NumEmp)
Location (Department, DepId, City, State)
Budget (Project, Leader, Size)
∃f (∀d∀p∀b WorksOn (d, p, b) → Project (f (d, p), b))
Slide 10 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
UnSkolemization Revisited
Example: Key Invention
Source Schema
WorksOn (Department, Project, BudgetId)
Audit (BudgetId, Auditor)
City (Department, City)
Target Schema
Project (PId, BudgetId)
Dept (Dept, Year, Project, NumEmp)
Location (Department, DepId, City, State)
Budget (Project, Leader, Size)
∃f (∀d∀p∀b WorksOn (d, p, b) → Project (f (d, p), b))
We need to introduce ∃vf nested within the scope of d and p
Slide 10 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
UnSkolemization Revisited
Example: Key Invention
Source Schema
WorksOn (Department, Project, BudgetId)
Audit (BudgetId, Auditor)
City (Department, City)
Target Schema
Project (PId, BudgetId)
Dept (Dept, Year, Project, NumEmp)
Location (Department, DepId, City, State)
Budget (Project, Leader, Size)
∃f (∀d∀p∀b WorksOn (d, p, b) → Project (f (d, p), b))
We need to introduce ∃vf nested within the scope of d and p
∀d∀p∃vf ∀b WorksOn (d, p, b) → Project (vf , b)
Slide 10 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
Sufficient Rewriting Condition
Approach
• When can Unskolemization be applied to all Skolems of SO tgd?
• Adapt notions from SO quantifier elimination methodsa
• Consistency:
• OK: . . . f (a) . . . f (a) → ∀a∃vf
• NOT OK: . . . f (a) . . . f (b) → ∀a∃vf ∀b∃vf
• Linearity:
• OK: . . . f (a) . . . g(a, b) → ∀a∃vf ∀b∃vg
• NOT OK:. . . f (a, b) . . . g(b, c) → ∀a∀b∃vf ∀c∃vg
• Partitioning scheme for multi-clause SO tgds
aD. Gabbay et al.,
Second Order Quantifier Elimination: Foundations, Computational Aspects and Applications,
(College Publications, 2008).
Slide 11 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
Sufficient Rewriting Condition
Approach
• When can Unskolemization be applied to all Skolems of SO tgd?
• Adapt notions from SO quantifier elimination methods
• Consistency:
• OK: . . . f (a) . . . f (a) → ∀a∃vf
• NOT OK: . . . f (a) . . . f (b) → ∀a∃vf ∀b∃vf
• Linearity:
• OK: . . . f (a) . . . g(a, b) → ∀a∃vf ∀b∃vg
• NOT OK:. . . f (a, b) . . . g(b, c) → ∀a∀b∃vf ∀c∃vg
• Partitioning scheme for multi-clause SO tgds
Theorem: Linearity
Given an SO tgd θ without equalities between or with Skolem terms
• Consistent
• Linear
⇒ θ can be rewritten as nested GLAV
Slide 11 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
Linearize Algorithm
Properties of the Algorithm
• Rewrites an SO tgd into nested GLAV
• PTIME
• Size of resulting formula is linear in the size of the input
Linearize(θ)
1 Partition θ into independent sub-formulas (maximal partitioning Π)
2 For each partition
• Check consistency and linearity
3 If all partitions are linear and consistent then
• Rewrite θ into Ω
Slide 12 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
Outline
1 Introduction
2 Linearization
3 Exploiting Source Constraints
4 Experiments
5 Conclusions
A Note on Linearity
• Linearity is an syntactic but not a semantic condition
• ⇒ There is hope that an equivalent mapping exists that is linear
• ⇒ Approach: Find an equivalent mapping that is linear
• Modify Skolem arguments?
Non-Linear SO tgd θ
Linear SO tgd θ nested GLAV Ω
Equivalence Preserving Transformation
Linearize
Slide 13 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
Using Source Functional Dependencies
So far
• Only considered an SO tgd θ
• Have not considered additional knowledge that may be available
Source Schema
WorksOn (Department, Project, BudgetId)
Audit (BudgetId, Auditor)
City (Department, City)
Target Schema
Project (PId, BudgetId)
Dept (Dept, Year, Project, NumEmp)
Location (Department, DepId, City, State)
Budget (Project, Leader, Size)
∃f ∃g WorksOn (d, p, b) ∧ Audit (b, a) → Budget (p, f (d, p), g(b, a))
Slide 14 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
Using Source Functional Dependencies
Source constraints
• Functional dependencies (FDs) ΣS that hold over the source
• Primary keys (and other FDs if available)
• FDs imply dependencies between the arguments of Skolem terms
Source Schema
WorksOn (Department, Project, BudgetId)
Audit (BudgetId, Auditor)
City (Department, City)
Target Schema
Project (PId, BudgetId)
Dept (Dept, Year, Project, NumEmp)
Location (Department, DepId, City, State)
Budget (Project, Leader, Size)
WorksOn: Department, Project → BudgetId Audit: BudgetId → Auditor
∃f ∃g WorksOn (d, p, b) ∧ Audit (b, a) → Budget (p, f (d, p), g(b, a))
Implied FD1 : d, p → b, a Implied FD2 : b → a
Slide 14 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
Using Source Functional Dependencies
Source constraints
• Functional dependencies (FDs) ΣS that hold over the source
• Primary keys (and other FDs if available)
• FDs imply dependencies between the arguments of Skolem terms
• FD x → y be used to augment Skolem arguments: f (x, z) → f (x, z, y)
Source Schema
WorksOn (Department, Project, BudgetId)
Audit (BudgetId, Auditor)
City (Department, City)
Target Schema
Project (PId, BudgetId)
Dept (Dept, Year, Project, NumEmp)
Location (Department, DepId, City, State)
Budget (Project, Leader, Size)
WorksOn: Department, Project → BudgetId Audit: BudgetId → Auditor
∃f ∃g WorksOn (d, p, b) ∧ Audit (b, a) → Budget (p, f (d, p, b, a), g(b, a))
Implied FD1 : d, p → b, a Implied FD2 : b → a
Slide 14 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
Equivalence Preserving Transformation
Approach
• Augment Skolem arguments using implied FDs (Re-Skolemization)
• Result θ that is equivalent as long as the FDs hold.
Non-Linear SO tgd θ and source FDs ΣS
Linear SO tgd θ and source FDs ΣS nested GLAV Ω
Re-Skolemize using implied FDs
Linearize
Theorem: Re-Skolemization with FDs preserves equivalence
Given an implied source FD x → y valid over θ:
θ[f (x) ← f (x, y)] ∪ ΣS ≡ θ ∪ ΣS
Slide 15 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
Why Augmentation?
Does Re-Skolemization affect Linearity?
• Augmentation (θ[f (x) ← f (x, y)])
• θaug
: Result of applying augmentation until no longer possible
• Minimization (θ[f (x, y) ← f (x)])a
• θmin
: Result of applying minimization until no longer possible
aB. Marnette et al., PVLDB 3 (2010).
Theorem: Only augmentation preserves Linearity
Linear(θ) → Linear(θaug
)
Linear(θ) → Linear(θmin
)
Slide 16 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
LinearizeFDs Algorithm
Properties of the Algorithm
• Rewrites SO tgd into nested GLAV
• PTIME
• Size of resulting formula is linear in the size of the input
LinearizeFDs(θ,ΣS )
1 Compute implied FDs
2 Augment arguments of each Skolem term based on FDs
• Using attribute closure
• Result: θaug
3 Return Linearize(θaug )
Slide 17 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
Outline
1 Introduction
2 Linearization
3 Exploiting Source Constraints
4 Experiments
5 Conclusions
Mapping Generator and Experiments
STBenchmark
• Generator for data exchange scenariosa
• Schemas, Data and Mappings
• Construct complex mappings from simple primitives
• e.g., Horizontal Partitioning (HP)
• Parameterized and randomized (e.g., join path length)
aB. Alexe et al., PVLDB 1 (2008).
Extensions
• Arbitrary Skolem terms (SO tgds)
• New primitives (e.g., Adding and Deleting Attributes, etc.)
• Combining primitives into more complex mappings
• e.g., simulating composition and complex correlations
• Primary Keys (PKs) and Functional dependencies (FDs)
Slide 18 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Experiments
Random Scenarios
• 12,500,000 randomly generated mapping scenarios
• Measure success rate
• Compare NBM, Linearize, LinearizeFDs, LinearizeMin
• NBM is only rewriting into GLAV!
Slide 19 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Experiments
Effect of Primary Keys
• Activate/Deactivate source PKs
• Vary amount of non-PK FDs
0%
20%
40%
60%
80%
100%
No PKs With PKs No PKs With PKs No PKs With PKs
SOURCE FDs = 0% SOURCE FDs = 25% SOURCE FDs = 50%
SuccessRate
Linearize LinearizeFDs
Slide 20 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Experiments
Outline
1 Introduction
2 Linearization
3 Exploiting Source Constraints
4 Experiments
5 Conclusions
Conclusions
Rewriting SO-tgds → nested GLAV
• Linearization
• SO tgd is linear → can be rewritten
• Equivalence preserving Re-Skolemization
• Using source FDs to augment Skolem arguments
Experimental and Theoretical Results
• Using FDs improves chance to rewrite
• 78% increased success rate
• Primary keys are most effective
• > 75% increased success rate
• Augmentation is better than minimization
• about 16% increased success rate
Slide 21 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Conclusions
Future Work
Integrate insights on Re-Skolemization into . . .
• Mapping operators such as
• Composition
• MapMerge
• Mapping generation
FO Rewritability of SO tgds
• Combine our sufficient condition with that of [NBM07]a
• we know how to do it!
• Exploit Augmentation and Minimization together
• to simplify and optimize SO mappings
• Use target FDs
aA. Nash et al., TODS 32 (2007).
Slide 22 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Conclusions
Questions?
Slide 23 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Conclusions
Appendix
6 References
7 Additional Notation
8 A Note on Complexity
9 Partitioning Scheme
10 Additional Experiments
11 STBenchmark 2.0
12 Additional Linearization Examples
13 Example Augmentation
R. Fagin, P. Kolaitis, R. J. Miller, L. Popa,
Data Exchange: Semantics and Query Answering.
Theor. Comput. Sci. 336 (2005).
R. Hull, M. Yoshikawa,
ILOG: Declarative Creation and Manipulation of Object Identifiers.
In VLDB (1990).
L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hern´andez, R. Fagin,
Translating Web Data. In VLDB (2002).
A. Fuxman et al., Nested Mappings: Schema Mapping Reloaded. In
VLDB (2006).
L. Libkin, C. Sirangelo,
Data Exchange and Schema Mappings in Open and Closed Worlds.
J. Comput. Syst. Sci. 77 (2011).
B. Alexe, M. A. Hern´andez, L. Popa, W. C. Tan,
MapMerge: Correlating Independent Schema Mappings. VLDB J.
21 (2012).
Slide 1 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: References
Y. Papakonstantinou, S. Abiteboul, H. Garcia-Molina,
Object Fusion in Mediator Systems. In VLDB (1996).
R. Fagin, P. Kolaitis, L. Popa, W.-C. Tan,
Composing Schema Mappings: Second-Order Dependencies to the Rescu
TODS 30 (2005).
B. ten Cate, P. Kolaitis,
Structural Characterizations of Schema-Mapping Languages. In
ICDT (2009).
I. Feinerer, R. Pichler, E. Sallinger, V. Savenkov,
On the Undecidability of the Equivalence of Second-Order Tuple Genera
In AMW (2011).
A. Nash, P. Bernstein, S. Melnik,
Composition of Mappings Given by Embedded Dependencies. TODS
32 (2007).
P. C. Arocena, M. D’Angelo, B. Glavic, R. J. Miller, “STBenchmark
2.0”, tech. rep. (Uni. of Toronto, 2013).
Slide 2 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: References
D. Gabbay, R. Schmidt, A. Szalas,
Second Order Quantifier Elimination: Foundations, Computational Aspe
(College Publications, 2008).
B. Marnette, G. Mecca, P. Papotti,
Scalable Data Exchange with Functional Dependencies. PVLDB 3
(2010).
B. Alexe, W. C. Tan, Y. Velegrakis,
STBenchmark: Towards a Benchmark for Mapping Systems.
PVLDB 1 (2008).
Slide 3 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: References
Appendix
6 References
7 Additional Notation
8 A Note on Complexity
9 Partitioning Scheme
10 Additional Experiments
11 STBenchmark 2.0
12 Additional Linearization Examples
13 Example Augmentation
Notation
GLAV (s-t tgds): ∀z, x(φ(z, x) → ∃yψ(x, y))
∀d ∀p ∀b Works(d, p, b) → ∃y1 ∃y2 Project(y1, b, y2)
-
nested GLAV: Q(x, y)((φ1(x) → ψ1(x, y)) ∧ . . . ∧ (φn(x) → ψn(x, y))),
where Q(x, y) is a sequence of quantifiers, that is, ∀ for x and ∃ for y
∀d ∃y1 ∀p ∃y2 ∀b Works(d, p, b) → Project(y1, b, y2)
SO tgds: ∃f( (∀x1(φ1 → ψ1)) ∧ · · · ∧ (∀xn(φn → ψn)) )
Note: we usually omit universal quantifiers
∃f ∃g(Works(d, p, b) → Project(f (p), b, g(d)))
Slide 4 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Additional Notation
Appendix
6 References
7 Additional Notation
8 A Note on Complexity
9 Partitioning Scheme
10 Additional Experiments
11 STBenchmark 2.0
12 Additional Linearization Examples
13 Example Augmentation
Model Checking
Complexity
• NP-complete for SO tgds vs. P for nested GLAV
• Are we only solving the simple cases?
Approach
• Find an SO tgd for which model checking is hard
• But can be rewritten using (implied) source FDs
Slide 5 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: A Note on Complexity
Model Checking: 3-colorability
Schema Mappings
θ = ∀X, Y :E(X, Y ) → C(f (X), g(Y ))
V (X, Y ) → S(f (X), g(Y ))
Not linear!
Instance
• The source relations encode an undirected graph G
• For each edge (x, y) we create two tuples E(x, y) and E(y, x)
• For each vertex x we create a tuple V (x, x)
• The target relations represent a coloring of the vertexes of G using
three colors r, g, and b
• C: (r, g), (r, b), (g, r), (g, b), (b, r), (b, g) - colors of adjacent nodes
• S: (r, r), (g, g), (b, b) - colors of vertexes
Theorem: Model Checking is 3-colorability
G is 3-colorable if θ holds over I
Slide 6 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: A Note on Complexity
Model Checking: 3-colorability
Schema Mappings
θ = ∀X, Y :E(X, Y ) → C(f (X, Y ), g(Y ))
V (X, Y ) → S(f (X, Y ), g(Y ))
Linear!
FD: X → Y
Instance
• The source relations encode an undirected graph G
• For each edge (x, y) we create two tuples E(x, y) and E(y, x)
• For each vertex x we create a tuple V (x, x)
• The target relations represent a coloring of the vertexes of G using
three colors r, g, and b
• C: (r, g), (r, b), (g, r), (g, b), (b, r), (b, g) - colors of adjacent nodes
• S: (r, r), (g, g), (b, b) - colors of vertexes
Theorem: Model Checking is 3-colorability
G is 3-colorable if θ holds over I
Slide 6 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: A Note on Complexity
Appendix
6 References
7 Additional Notation
8 A Note on Complexity
9 Partitioning Scheme
10 Additional Experiments
11 STBenchmark 2.0
12 Additional Linearization Examples
13 Example Augmentation
SO tgds are closed conjunction
• Every SO tgd θ can be written as a set of clauses (φi → ψi )
• Splitting this set of clauses to form new SO tgds θ1, . . . θn is
equivalence preserving
• If θi and θj share none Skolems (they are uncorrelated)
• then θi and θj can be rewritten independently
Slide 7 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Partitioning Scheme
Maximal Partition
Maximal Partition
• Given an SO tgd θ
• Partition clauses into Π = π1, . . . , πn
1 No πi and πj share any skolems
2 There is no Π with more elements than Π that fulfills condition 1)
Theorem: Rewritability and Maximal Partitions
Rewritable (θ) ⇔ ∀i : Rewritable (πi )
Slide 8 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Partitioning Scheme
Appendix
6 References
7 Additional Notation
8 A Note on Complexity
9 Partitioning Scheme
10 Additional Experiments
11 STBenchmark 2.0
12 Additional Linearization Examples
13 Example Augmentation
Real-Life Mappings
• Three real-life mapping scenarios from the literature
• Created SO tgds based on
• Semantics of the schemas
• Documented data transformations
• Compared all rewriting techniques
Slide 9 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Additional Experiments
Appendix
6 References
7 Additional Notation
8 A Note on Complexity
9 Partitioning Scheme
10 Additional Experiments
11 STBenchmark 2.0
12 Additional Linearization Examples
13 Example Augmentation
Towards STBenchmark 2.0
Noteworthy Features
• Support for arbitrary Skolem Functions (SO tgds) and various
Skolemization modes (e.g., Key, All and Random)
• Simulating some cases of composition using Skolem Noise
• Reuse of source schema elements using Source Reuse
• PKs and random multi-attribute FDs over the source
Usability Case?
• Thinking about comparing different notions of mapping inverse
• Any suggestions?
Slide 10 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: STBenchmark 2.0
Appendix
6 References
7 Additional Notation
8 A Note on Complexity
9 Partitioning Scheme
10 Additional Experiments
11 STBenchmark 2.0
12 Additional Linearization Examples
13 Example Augmentation
Notation
GLAV (s-t tgds): ∀z, x(φ(z, x) → ∃yψ(x, y))
∀d ∀p ∀b Works(d, p, b) → ∃y1 ∃y2 Project(y1, b, y2)
nested GLAV: Q(x, y)((φ1(x) → ψ1(x, y)) ∧ . . . ∧ (φn(x) → ψn(x, y))),
where Q(x, y) is a sequence of quantifiers, that is, ∀ for x and ∃ for y
∀d ∃y1 ∀p ∃y2 ∀b Works(d, p, b) → Project(y1, b, y2)
SO tgds: ∃f( (∀x1(φ1 → ψ1)) ∧ · · · ∧ (∀xn(φn → ψn)) )
Note: we usually omit universal quantifiers
∃f ∃g(Works(d, p, b) → Project(f (p), b, g(d)))
Slide 11 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Additional Linearization Examples
Nesting
Example: Skolems with Overlapping Arguments
Source Schema
WorksOn (Department, Project, BudgetId)
Audit (BudgetId, Auditor)
City (Department, City)
Target Schema
Project (PId, BudgetId)
Dept (Dept, Year, Project, NumEmp)
Location (Department, DepId, City, State)
Budget (Project, Leader, Size)
∃f ∃g( WorksOn (d, p, b) → Dept (d, f (d), p, g(d, p)))
Slide 12 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Additional Linearization Examples
Nesting
Example: Skolems with Overlapping Arguments
Source Schema
WorksOn (Department, Project, BudgetId)
Audit (BudgetId, Auditor)
City (Department, City)
Target Schema
Project (PId, BudgetId)
Dept (Dept, Year, Project, NumEmp)
Location (Department, DepId, City, State)
Budget (Project, Leader, Size)
∃f ∃g( WorksOn (d, p, b) → Dept (d, f (d), p, g(d, p)))
We need to introduce two ∃ quantifiers without violating the dependencies
modeled by f and g
Slide 12 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Additional Linearization Examples
Nesting and Linearization
Example: Skolems with Overlapping Arguments
Source Schema
WorksOn (Department, Project, BudgetId)
Audit (BudgetId, Auditor)
City (Department, City)
Target Schema
Project (PId, BudgetId)
Dept (Dept, Year, Project, NumEmp)
Location (Department, DepId, City, State)
Budget (Project, Leader, Size)
∃f ∃g( WorksOn (d, p, b) → Dept (d, f (d), p, g(d, p)))
We need to introduce two ∃ quantifiers without violating the dependencies
modeled by f and g
∀d∃vf ∀p∃vg ∀b WorksOn (d, p, b) → Dept (d, vf , p, vg )
Slide 12 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Additional Linearization Examples
Appendix
6 References
7 Additional Notation
8 A Note on Complexity
9 Partitioning Scheme
10 Additional Experiments
11 STBenchmark 2.0
12 Additional Linearization Examples
13 Example Augmentation
Augmentation is better than Minimization
Source Schema
WorksOn (Department, Project, BudgetId)
Audit (BudgetId, Auditor)
City (Department, City)
Target Schema
Project (PId, BudgetId)
Dept (Dept, Year, Project, NumEmp)
Location (Department, DepId, City, State)
Budget (Project, Leader, Size)
WorksOn: Department, Project → BudgetId Audit: BudgetId → Auditor
θ = ∃f ∃g WorksOn (d, p, b) ∧ Audit (b, a) → Budget (p, f (d, p), g(b, a))
θaug
= ∃f ∃g . . . → Budget (p, f (d, p, b, a), g(b, a))
θmin
= ∃f ∃g . . . → Budget (p, f (d, p), g(b))
FD1 : d, p → b, a FD2 : b → a
Slide 13 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Example Augmentation

Mais conteúdo relacionado

Mais procurados

R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenEdureka!
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 
Introduction to ggplot2
Introduction to ggplot2Introduction to ggplot2
Introduction to ggplot2maikroeder
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesGábor Szárnyas
 
Gremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise GraphGremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise GraphStephen Mallette
 
Algebraic Property Graphs
Algebraic Property GraphsAlgebraic Property Graphs
Algebraic Property GraphsAdrian Wilke
 
Why functional programming and category theory strongly matters
Why functional programming and category theory strongly mattersWhy functional programming and category theory strongly matters
Why functional programming and category theory strongly mattersPiotr Paradziński
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folksThomas Hütter
 
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-M
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-MExecuting SPARQL Queries over Mapped Document Stores with SparqlMap-M
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-MLinked Enterprise Date Services
 
BigML Summer 2016 Release
BigML Summer 2016 ReleaseBigML Summer 2016 Release
BigML Summer 2016 ReleaseBigML, Inc
 
Data Structures, Graphs
Data Structures, GraphsData Structures, Graphs
Data Structures, GraphsJibrael Jos
 
Power BI Computing Languages
Power BI Computing LanguagesPower BI Computing Languages
Power BI Computing LanguagesCarlos J. Costa
 
Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)Richard Zijdeman
 
Introduction into R for historians (part 3: examine and import data)
Introduction into R for historians (part 3: examine and import data)Introduction into R for historians (part 3: examine and import data)
Introduction into R for historians (part 3: examine and import data)Richard Zijdeman
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2jalle6
 

Mais procurados (16)

R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in Heaven
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Introduction to ggplot2
Introduction to ggplot2Introduction to ggplot2
Introduction to ggplot2
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
 
CS8391 Data Structures 2 mark Questions - Anna University Questions
CS8391 Data Structures 2 mark Questions - Anna University QuestionsCS8391 Data Structures 2 mark Questions - Anna University Questions
CS8391 Data Structures 2 mark Questions - Anna University Questions
 
Gremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise GraphGremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise Graph
 
Algebraic Property Graphs
Algebraic Property GraphsAlgebraic Property Graphs
Algebraic Property Graphs
 
Why functional programming and category theory strongly matters
Why functional programming and category theory strongly mattersWhy functional programming and category theory strongly matters
Why functional programming and category theory strongly matters
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-M
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-MExecuting SPARQL Queries over Mapped Document Stores with SparqlMap-M
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-M
 
BigML Summer 2016 Release
BigML Summer 2016 ReleaseBigML Summer 2016 Release
BigML Summer 2016 Release
 
Data Structures, Graphs
Data Structures, GraphsData Structures, Graphs
Data Structures, Graphs
 
Power BI Computing Languages
Power BI Computing LanguagesPower BI Computing Languages
Power BI Computing Languages
 
Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)
 
Introduction into R for historians (part 3: examine and import data)
Introduction into R for historians (part 3: examine and import data)Introduction into R for historians (part 3: examine and import data)
Introduction into R for historians (part 3: examine and import data)
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2
 

Semelhante a SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"

Knowledge Graphs and Milestone
Knowledge Graphs and MilestoneKnowledge Graphs and Milestone
Knowledge Graphs and MilestoneBarry Norton
 
Dagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphsDagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphsArijit Khan
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Ontotext
 
A Graph-based Model for multimodal Information Retrieval (partially presented)
A Graph-based Model for multimodal Information Retrieval (partially presented)A Graph-based Model for multimodal Information Retrieval (partially presented)
A Graph-based Model for multimodal Information Retrieval (partially presented)serwah_S_gh
 
Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpAdrian Ziegler
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXBenjamin Bengfort
 
A Graph-based Model for Multimodal Information Retrieval
A Graph-based Model for Multimodal Information RetrievalA Graph-based Model for Multimodal Information Retrieval
A Graph-based Model for Multimodal Information Retrievalserwah_S_gh
 
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...Boris Glavic
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsJiaheng Lu
 
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph CompletionNaomi Shiraishi
 
Relaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked dataRelaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked dataAlessandro Adamou
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetupJoshua Bae
 
Euro30 2019 - Benchmarking tree approaches on street data
Euro30 2019 - Benchmarking tree approaches on street dataEuro30 2019 - Benchmarking tree approaches on street data
Euro30 2019 - Benchmarking tree approaches on street dataFabion Kauker
 
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerVitor Hirota Makiyama
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jNeo4j
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceOptum
 
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...Joshua Shinavier
 

Semelhante a SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange" (20)

Knowledge Graphs and Milestone
Knowledge Graphs and MilestoneKnowledge Graphs and Milestone
Knowledge Graphs and Milestone
 
Dagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphsDagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphs
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
A Graph-based Model for multimodal Information Retrieval (partially presented)
A Graph-based Model for multimodal Information Retrieval (partially presented)A Graph-based Model for multimodal Information Retrieval (partially presented)
A Graph-based Model for multimodal Information Retrieval (partially presented)
 
Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExp
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
A Graph-based Model for Multimodal Information Retrieval
A Graph-based Model for Multimodal Information RetrievalA Graph-based Model for Multimodal Information Retrieval
A Graph-based Model for Multimodal Information Retrieval
 
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing Paradigms
 
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
 
Graph Analytics
Graph AnalyticsGraph Analytics
Graph Analytics
 
Relaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked dataRelaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked data
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetup
 
Cerved Datascience Milan
Cerved Datascience MilanCerved Datascience Milan
Cerved Datascience Milan
 
Euro30 2019 - Benchmarking tree approaches on street data
Euro30 2019 - Benchmarking tree approaches on street dataEuro30 2019 - Benchmarking tree approaches on street data
Euro30 2019 - Benchmarking tree approaches on street data
 
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
 
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
 

Mais de Boris Glavic

2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...Boris Glavic
 
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...Boris Glavic
 
2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata GeneratorBoris Glavic
 
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...Boris Glavic
 
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...Boris Glavic
 
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-AnswersBoris Glavic
 
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSONBoris Glavic
 
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersTaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersBoris Glavic
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationBoris Glavic
 
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceTaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceBoris Glavic
 
EDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested SubqueriesEDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested SubqueriesBoris Glavic
 
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...Boris Glavic
 
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"Boris Glavic
 
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"Boris Glavic
 
TaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningTaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningBoris Glavic
 
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...Boris Glavic
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanBoris Glavic
 

Mais de Boris Glavic (17)

2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
 
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
 
2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator
 
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
 
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
 
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
 
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
 
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersTaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database Virtualization
 
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceTaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
 
EDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested SubqueriesEDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested Subqueries
 
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
 
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
 
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
 
TaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningTaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data Mining
 
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, Ian
 

Último

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 

Último (20)

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 

SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"

  • 1. Value Invention in Data Exchange Patricia Arocena1 Boris Glavic2 Ren´ee J. Miller1 University of Toronto1 DBGroup Illinois Institute of Technology2 DBGroup SIGMOD 2013 - June 25, 2013 - New York, USA
  • 2. Outline 1 Introduction 2 Linearization 3 Exploiting Source Constraints 4 Experiments 5 Conclusions
  • 3. The Data Exchange Problem1 Schema Mappings M = (S, T, Σ) • Source Schema S and Target Schema T • High-level specification Σ • models the relationship between S and T Source Schema S Target Schema T Source Data Target Data M Source Schema S Target Schema T Source Data Target Data MSource Schema S Target Schema T Source Data Target Data M 1R. Fagin et al., Theor. Comput. Sci. 336 (2005). Slide 1 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 4. The Data Exchange Problem1 Schema Mappings M = (S, T, Σ) • Source Schema S and Target Schema T • High-level specification Σ • models the relationship between S and T Data Exchange • Given an instance of S Source Schema S Target Schema T Source Data Target Data M Source Schema S Target Schema T Source Data Target Data M Source Schema S Target Schema T Source Data Target Data MSource Schema S Target Schema T Source Data Target Data M 1R. Fagin et al., Theor. Comput. Sci. 336 (2005). Slide 1 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 5. The Data Exchange Problem1 Schema Mappings M = (S, T, Σ) • Source Schema S and Target Schema T • High-level specification Σ • models the relationship between S and T Data Exchange • Given an instance of S • How to materialize a target instance of T? Source Schema S Target Schema T Source Data Target Data M Source Schema S Target Schema T Source Data Target Data M Source Schema S Target Schema T Source Data Target Data MSource Schema S Target Schema T Source Data Target Data M Source Schema S Target Schema T Source Data Target Data M 1R. Fagin et al., Theor. Comput. Sci. 336 (2005). Slide 1 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 6. Example Source Schema S Target Schema T Source Data Target Data MWorksOn(Department,Project,City) Source Schema S Target Schema T Source Data Target Data M Projects(PId, City, ManagerId)Source Schema S Target Schema T Source Data Target Data MSource Schema S Target Schema T Source Data Target Data M Source Schema S Target Schema T Source Data Target Data M IT Web Toronto IT Big Data Chicago Sales Mobile New York NULL Toronto NULL NULL Chicago NULL NULL New York NULL We usually create values to represent incomplete information! Slide 2 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 7. Value Invention Source Schema S Target Schema T Source Data Target Data MWorksOn(Department,Project,City) Source Schema S Target Schema T Source Data Target Data M Projects(PId, City, ManagerId)Source Schema S Target Schema T Source Data Target Data MSource Schema S Target Schema T Source Data Target Data M Source Schema S Target Schema T Source Data Target Data M IT Web Toronto IT Big Data Chicago Sales Mobile New York f(Web) Toronto g(IT) f(Big Data) Chicago g(IT) f(Mobile) New York g(Sales) We usually create values to represent incomplete information! Slide 2 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 8. Value Invention Source Schema S Target Schema T Source Data Target Data MWorksOn(Department,Project,City) Source Schema S Target Schema T Source Data Target Data M Projects(PId, City, ManagerId)Source Schema S Target Schema T Source Data Target Data MSource Schema S Target Schema T Source Data Target Data M Source Schema S Target Schema T Source Data Target Data M IT Web Toronto IT Big Data Chicago Sales Mobile New York f(Web) Toronto g(IT) f(Big Data) Chicago g(IT) f(Mobile) New York g(Sales) We usually create values to represent incomplete information! ∃f ∃g ( WorksOn (d, p, c) → Project (f (p), c, g(d)) ) Slide 2 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 9. Our Goal • Understand when schema mappings specified by SO tgds • Flexible and precise value invention • . . . can be rewritten into nested GLAV mappings • Desirable computational and programatic properties Slide 3 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 10. Skolem Functions • Introduced by Thoralf A. Skolem (1920s) • Widely used in Mathematical Logic and Computer Science Many important uses in Information Integration • to model object identifier (OID) inventiona aR. Hull, M. Yoshikawa, In VLDB (1990). Slide 4 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 11. Skolem Functions • Introduced by Thoralf A. Skolem (1920s) • Widely used in Mathematical Logic and Computer Science Many important uses in Information Integration • to model object identifier (OID) invention • to express correlation semantics (e.g., grouping and data merging)abcd aL. Popa et al., In VLDB (2002). bA. Fuxman et al., In VLDB (2006). cL. Libkin, C. Sirangelo, J. Comput. Syst. Sci. 77 (2011). dB. Alexe et al., VLDB J. 21 (2012). Slide 4 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 12. Skolem Functions • Introduced by Thoralf A. Skolem (1920s) • Widely used in Mathematical Logic and Computer Science Many important uses in Information Integration • to model object identifier (OID) invention • to express correlation semantics (e.g., grouping and data merging) • to provide a precise representation of missing and incomplete informationabc aY. Papakonstantinou et al., In VLDB (1996). bL. Popa et al., In VLDB (2002). cR. Fagin et al., TODS 30 (2005). Slide 4 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 13. Schema Mapping Languages Various logical mapping formalisms • s-t tgds (also known as GLAV)a • Nested s-t tgds (nested GLAV)b • Second-Order (SO) tgdsc aR. Fagin et al., Theor. Comput. Sci. 336 (2005). bA. Fuxman et al., In VLDB (2006). cR. Fagin et al., TODS 30 (2005). Slide 5 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 14. Schema Mapping Languages Various logical mapping formalisms • s-t tgds (also known as GLAV) • Nested s-t tgds (nested GLAV) • Second-Order (SO) tgds Expressiveness • SO tgds permits arbitrary Skolems!a • FO mapping languages have more desirable programmatic and computational propertiesb aR. Fagin et al., TODS 30 (2005). bB. ten Cate, P. Kolaitis, In ICDT (2009). Slide 5 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 15. Characterization of Mapping Languages234 Property GLAV nested GLAV SO tgds Composition Not closed Not closed Closed Value Invention No Linear Fully customized correlation correlation correlation Target Homomorphisms Closed Closed Not closed Model Checking PTIME PTIME NP-Complete 2R. Fagin et al., Theor. Comput. Sci. 336 (2005). 3R. Fagin et al., TODS 30 (2005). 4B. ten Cate, P. Kolaitis, In ICDT (2009). Slide 6 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 16. The Quest for FO Rewritability Rewritability • Many SO tgds are equivalent to FO mappings! • We call this FO/GLAV/nested GLAV rewritable • Some SO tgds are not FO rewritablea • . . . Even testing for FO rewritability is undecidableb aR. Fagin et al., TODS 30 (2005). bI. Feinerer et al., In AMW (2011). Nash, Bernstein and Melnik • First sufficient condition for GLAV rewritabilitya • Tailored to consider SO tgds produced by mapping composition aA. Nash et al., TODS 32 (2007). Slide 7 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 17. Our Contributions 1 Sufficient condition for nested GLAV rewritability of SO tgds 2 Linearize: • PTIME algorithm for rewriting SO tgds 3 Equivalence preserving transformation of SO tgds using source semantics 4 LinearizeFDs: • PTIME algorithm for rewriting SO tgds using source FDs 5 Extensive experimental evaluation • STBenchmark 2.0a • Real-life mapping scenarios aP. C. Arocena et al., “STBenchmark 2.0”, tech. rep. (Uni. of Toronto, 2013). Slide 8 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Introduction
  • 18. Outline 1 Introduction 2 Linearization 3 Exploiting Source Constraints 4 Experiments 5 Conclusions
  • 19. Intuition of Rewriting Rewrite SO tgds into nested GLAV • Replace second-order existentials with first-order existentials • ∃f (x) → ∃vf • Apply logical equivalence of Skolemization in reverse direction • May have to reorder universal quantifiers to create ∀x Skolemization Equivalence ∀x∃vf δ(x, vf ) ≡ ∃f ∀x δ(x, vf )[vf ← f (x)] Slide 9 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
  • 20. UnSkolemization Revisited Example: Key Invention Source Schema WorksOn (Department, Project, BudgetId) Audit (BudgetId, Auditor) City (Department, City) Target Schema Project (PId, BudgetId) Dept (Dept, Year, Project, NumEmp) Location (Department, DepId, City, State) Budget (Project, Leader, Size) ∃f (∀d∀p∀b WorksOn (d, p, b) → Project (f (d, p), b)) Slide 10 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
  • 21. UnSkolemization Revisited Example: Key Invention Source Schema WorksOn (Department, Project, BudgetId) Audit (BudgetId, Auditor) City (Department, City) Target Schema Project (PId, BudgetId) Dept (Dept, Year, Project, NumEmp) Location (Department, DepId, City, State) Budget (Project, Leader, Size) ∃f (∀d∀p∀b WorksOn (d, p, b) → Project (f (d, p), b)) We need to introduce ∃vf nested within the scope of d and p Slide 10 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
  • 22. UnSkolemization Revisited Example: Key Invention Source Schema WorksOn (Department, Project, BudgetId) Audit (BudgetId, Auditor) City (Department, City) Target Schema Project (PId, BudgetId) Dept (Dept, Year, Project, NumEmp) Location (Department, DepId, City, State) Budget (Project, Leader, Size) ∃f (∀d∀p∀b WorksOn (d, p, b) → Project (f (d, p), b)) We need to introduce ∃vf nested within the scope of d and p ∀d∀p∃vf ∀b WorksOn (d, p, b) → Project (vf , b) Slide 10 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
  • 23. Sufficient Rewriting Condition Approach • When can Unskolemization be applied to all Skolems of SO tgd? • Adapt notions from SO quantifier elimination methodsa • Consistency: • OK: . . . f (a) . . . f (a) → ∀a∃vf • NOT OK: . . . f (a) . . . f (b) → ∀a∃vf ∀b∃vf • Linearity: • OK: . . . f (a) . . . g(a, b) → ∀a∃vf ∀b∃vg • NOT OK:. . . f (a, b) . . . g(b, c) → ∀a∀b∃vf ∀c∃vg • Partitioning scheme for multi-clause SO tgds aD. Gabbay et al., Second Order Quantifier Elimination: Foundations, Computational Aspects and Applications, (College Publications, 2008). Slide 11 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
  • 24. Sufficient Rewriting Condition Approach • When can Unskolemization be applied to all Skolems of SO tgd? • Adapt notions from SO quantifier elimination methods • Consistency: • OK: . . . f (a) . . . f (a) → ∀a∃vf • NOT OK: . . . f (a) . . . f (b) → ∀a∃vf ∀b∃vf • Linearity: • OK: . . . f (a) . . . g(a, b) → ∀a∃vf ∀b∃vg • NOT OK:. . . f (a, b) . . . g(b, c) → ∀a∀b∃vf ∀c∃vg • Partitioning scheme for multi-clause SO tgds Theorem: Linearity Given an SO tgd θ without equalities between or with Skolem terms • Consistent • Linear ⇒ θ can be rewritten as nested GLAV Slide 11 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
  • 25. Linearize Algorithm Properties of the Algorithm • Rewrites an SO tgd into nested GLAV • PTIME • Size of resulting formula is linear in the size of the input Linearize(θ) 1 Partition θ into independent sub-formulas (maximal partitioning Π) 2 For each partition • Check consistency and linearity 3 If all partitions are linear and consistent then • Rewrite θ into Ω Slide 12 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Linearization
  • 26. Outline 1 Introduction 2 Linearization 3 Exploiting Source Constraints 4 Experiments 5 Conclusions
  • 27. A Note on Linearity • Linearity is an syntactic but not a semantic condition • ⇒ There is hope that an equivalent mapping exists that is linear • ⇒ Approach: Find an equivalent mapping that is linear • Modify Skolem arguments? Non-Linear SO tgd θ Linear SO tgd θ nested GLAV Ω Equivalence Preserving Transformation Linearize Slide 13 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
  • 28. Using Source Functional Dependencies So far • Only considered an SO tgd θ • Have not considered additional knowledge that may be available Source Schema WorksOn (Department, Project, BudgetId) Audit (BudgetId, Auditor) City (Department, City) Target Schema Project (PId, BudgetId) Dept (Dept, Year, Project, NumEmp) Location (Department, DepId, City, State) Budget (Project, Leader, Size) ∃f ∃g WorksOn (d, p, b) ∧ Audit (b, a) → Budget (p, f (d, p), g(b, a)) Slide 14 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
  • 29. Using Source Functional Dependencies Source constraints • Functional dependencies (FDs) ΣS that hold over the source • Primary keys (and other FDs if available) • FDs imply dependencies between the arguments of Skolem terms Source Schema WorksOn (Department, Project, BudgetId) Audit (BudgetId, Auditor) City (Department, City) Target Schema Project (PId, BudgetId) Dept (Dept, Year, Project, NumEmp) Location (Department, DepId, City, State) Budget (Project, Leader, Size) WorksOn: Department, Project → BudgetId Audit: BudgetId → Auditor ∃f ∃g WorksOn (d, p, b) ∧ Audit (b, a) → Budget (p, f (d, p), g(b, a)) Implied FD1 : d, p → b, a Implied FD2 : b → a Slide 14 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
  • 30. Using Source Functional Dependencies Source constraints • Functional dependencies (FDs) ΣS that hold over the source • Primary keys (and other FDs if available) • FDs imply dependencies between the arguments of Skolem terms • FD x → y be used to augment Skolem arguments: f (x, z) → f (x, z, y) Source Schema WorksOn (Department, Project, BudgetId) Audit (BudgetId, Auditor) City (Department, City) Target Schema Project (PId, BudgetId) Dept (Dept, Year, Project, NumEmp) Location (Department, DepId, City, State) Budget (Project, Leader, Size) WorksOn: Department, Project → BudgetId Audit: BudgetId → Auditor ∃f ∃g WorksOn (d, p, b) ∧ Audit (b, a) → Budget (p, f (d, p, b, a), g(b, a)) Implied FD1 : d, p → b, a Implied FD2 : b → a Slide 14 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
  • 31. Equivalence Preserving Transformation Approach • Augment Skolem arguments using implied FDs (Re-Skolemization) • Result θ that is equivalent as long as the FDs hold. Non-Linear SO tgd θ and source FDs ΣS Linear SO tgd θ and source FDs ΣS nested GLAV Ω Re-Skolemize using implied FDs Linearize Theorem: Re-Skolemization with FDs preserves equivalence Given an implied source FD x → y valid over θ: θ[f (x) ← f (x, y)] ∪ ΣS ≡ θ ∪ ΣS Slide 15 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
  • 32. Why Augmentation? Does Re-Skolemization affect Linearity? • Augmentation (θ[f (x) ← f (x, y)]) • θaug : Result of applying augmentation until no longer possible • Minimization (θ[f (x, y) ← f (x)])a • θmin : Result of applying minimization until no longer possible aB. Marnette et al., PVLDB 3 (2010). Theorem: Only augmentation preserves Linearity Linear(θ) → Linear(θaug ) Linear(θ) → Linear(θmin ) Slide 16 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
  • 33. LinearizeFDs Algorithm Properties of the Algorithm • Rewrites SO tgd into nested GLAV • PTIME • Size of resulting formula is linear in the size of the input LinearizeFDs(θ,ΣS ) 1 Compute implied FDs 2 Augment arguments of each Skolem term based on FDs • Using attribute closure • Result: θaug 3 Return Linearize(θaug ) Slide 17 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Exploiting Source Constraints
  • 34. Outline 1 Introduction 2 Linearization 3 Exploiting Source Constraints 4 Experiments 5 Conclusions
  • 35. Mapping Generator and Experiments STBenchmark • Generator for data exchange scenariosa • Schemas, Data and Mappings • Construct complex mappings from simple primitives • e.g., Horizontal Partitioning (HP) • Parameterized and randomized (e.g., join path length) aB. Alexe et al., PVLDB 1 (2008). Extensions • Arbitrary Skolem terms (SO tgds) • New primitives (e.g., Adding and Deleting Attributes, etc.) • Combining primitives into more complex mappings • e.g., simulating composition and complex correlations • Primary Keys (PKs) and Functional dependencies (FDs) Slide 18 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Experiments
  • 36. Random Scenarios • 12,500,000 randomly generated mapping scenarios • Measure success rate • Compare NBM, Linearize, LinearizeFDs, LinearizeMin • NBM is only rewriting into GLAV! Slide 19 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Experiments
  • 37. Effect of Primary Keys • Activate/Deactivate source PKs • Vary amount of non-PK FDs 0% 20% 40% 60% 80% 100% No PKs With PKs No PKs With PKs No PKs With PKs SOURCE FDs = 0% SOURCE FDs = 25% SOURCE FDs = 50% SuccessRate Linearize LinearizeFDs Slide 20 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Experiments
  • 38. Outline 1 Introduction 2 Linearization 3 Exploiting Source Constraints 4 Experiments 5 Conclusions
  • 39. Conclusions Rewriting SO-tgds → nested GLAV • Linearization • SO tgd is linear → can be rewritten • Equivalence preserving Re-Skolemization • Using source FDs to augment Skolem arguments Experimental and Theoretical Results • Using FDs improves chance to rewrite • 78% increased success rate • Primary keys are most effective • > 75% increased success rate • Augmentation is better than minimization • about 16% increased success rate Slide 21 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Conclusions
  • 40. Future Work Integrate insights on Re-Skolemization into . . . • Mapping operators such as • Composition • MapMerge • Mapping generation FO Rewritability of SO tgds • Combine our sufficient condition with that of [NBM07]a • we know how to do it! • Exploit Augmentation and Minimization together • to simplify and optimize SO mappings • Use target FDs aA. Nash et al., TODS 32 (2007). Slide 22 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Conclusions
  • 41. Questions? Slide 23 of 26 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Conclusions
  • 42. Appendix 6 References 7 Additional Notation 8 A Note on Complexity 9 Partitioning Scheme 10 Additional Experiments 11 STBenchmark 2.0 12 Additional Linearization Examples 13 Example Augmentation
  • 43. R. Fagin, P. Kolaitis, R. J. Miller, L. Popa, Data Exchange: Semantics and Query Answering. Theor. Comput. Sci. 336 (2005). R. Hull, M. Yoshikawa, ILOG: Declarative Creation and Manipulation of Object Identifiers. In VLDB (1990). L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hern´andez, R. Fagin, Translating Web Data. In VLDB (2002). A. Fuxman et al., Nested Mappings: Schema Mapping Reloaded. In VLDB (2006). L. Libkin, C. Sirangelo, Data Exchange and Schema Mappings in Open and Closed Worlds. J. Comput. Syst. Sci. 77 (2011). B. Alexe, M. A. Hern´andez, L. Popa, W. C. Tan, MapMerge: Correlating Independent Schema Mappings. VLDB J. 21 (2012). Slide 1 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: References
  • 44. Y. Papakonstantinou, S. Abiteboul, H. Garcia-Molina, Object Fusion in Mediator Systems. In VLDB (1996). R. Fagin, P. Kolaitis, L. Popa, W.-C. Tan, Composing Schema Mappings: Second-Order Dependencies to the Rescu TODS 30 (2005). B. ten Cate, P. Kolaitis, Structural Characterizations of Schema-Mapping Languages. In ICDT (2009). I. Feinerer, R. Pichler, E. Sallinger, V. Savenkov, On the Undecidability of the Equivalence of Second-Order Tuple Genera In AMW (2011). A. Nash, P. Bernstein, S. Melnik, Composition of Mappings Given by Embedded Dependencies. TODS 32 (2007). P. C. Arocena, M. D’Angelo, B. Glavic, R. J. Miller, “STBenchmark 2.0”, tech. rep. (Uni. of Toronto, 2013). Slide 2 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: References
  • 45. D. Gabbay, R. Schmidt, A. Szalas, Second Order Quantifier Elimination: Foundations, Computational Aspe (College Publications, 2008). B. Marnette, G. Mecca, P. Papotti, Scalable Data Exchange with Functional Dependencies. PVLDB 3 (2010). B. Alexe, W. C. Tan, Y. Velegrakis, STBenchmark: Towards a Benchmark for Mapping Systems. PVLDB 1 (2008). Slide 3 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: References
  • 46. Appendix 6 References 7 Additional Notation 8 A Note on Complexity 9 Partitioning Scheme 10 Additional Experiments 11 STBenchmark 2.0 12 Additional Linearization Examples 13 Example Augmentation
  • 47. Notation GLAV (s-t tgds): ∀z, x(φ(z, x) → ∃yψ(x, y)) ∀d ∀p ∀b Works(d, p, b) → ∃y1 ∃y2 Project(y1, b, y2) - nested GLAV: Q(x, y)((φ1(x) → ψ1(x, y)) ∧ . . . ∧ (φn(x) → ψn(x, y))), where Q(x, y) is a sequence of quantifiers, that is, ∀ for x and ∃ for y ∀d ∃y1 ∀p ∃y2 ∀b Works(d, p, b) → Project(y1, b, y2) SO tgds: ∃f( (∀x1(φ1 → ψ1)) ∧ · · · ∧ (∀xn(φn → ψn)) ) Note: we usually omit universal quantifiers ∃f ∃g(Works(d, p, b) → Project(f (p), b, g(d))) Slide 4 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Additional Notation
  • 48. Appendix 6 References 7 Additional Notation 8 A Note on Complexity 9 Partitioning Scheme 10 Additional Experiments 11 STBenchmark 2.0 12 Additional Linearization Examples 13 Example Augmentation
  • 49. Model Checking Complexity • NP-complete for SO tgds vs. P for nested GLAV • Are we only solving the simple cases? Approach • Find an SO tgd for which model checking is hard • But can be rewritten using (implied) source FDs Slide 5 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: A Note on Complexity
  • 50. Model Checking: 3-colorability Schema Mappings θ = ∀X, Y :E(X, Y ) → C(f (X), g(Y )) V (X, Y ) → S(f (X), g(Y )) Not linear! Instance • The source relations encode an undirected graph G • For each edge (x, y) we create two tuples E(x, y) and E(y, x) • For each vertex x we create a tuple V (x, x) • The target relations represent a coloring of the vertexes of G using three colors r, g, and b • C: (r, g), (r, b), (g, r), (g, b), (b, r), (b, g) - colors of adjacent nodes • S: (r, r), (g, g), (b, b) - colors of vertexes Theorem: Model Checking is 3-colorability G is 3-colorable if θ holds over I Slide 6 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: A Note on Complexity
  • 51. Model Checking: 3-colorability Schema Mappings θ = ∀X, Y :E(X, Y ) → C(f (X, Y ), g(Y )) V (X, Y ) → S(f (X, Y ), g(Y )) Linear! FD: X → Y Instance • The source relations encode an undirected graph G • For each edge (x, y) we create two tuples E(x, y) and E(y, x) • For each vertex x we create a tuple V (x, x) • The target relations represent a coloring of the vertexes of G using three colors r, g, and b • C: (r, g), (r, b), (g, r), (g, b), (b, r), (b, g) - colors of adjacent nodes • S: (r, r), (g, g), (b, b) - colors of vertexes Theorem: Model Checking is 3-colorability G is 3-colorable if θ holds over I Slide 6 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: A Note on Complexity
  • 52. Appendix 6 References 7 Additional Notation 8 A Note on Complexity 9 Partitioning Scheme 10 Additional Experiments 11 STBenchmark 2.0 12 Additional Linearization Examples 13 Example Augmentation
  • 53. SO tgds are closed conjunction • Every SO tgd θ can be written as a set of clauses (φi → ψi ) • Splitting this set of clauses to form new SO tgds θ1, . . . θn is equivalence preserving • If θi and θj share none Skolems (they are uncorrelated) • then θi and θj can be rewritten independently Slide 7 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Partitioning Scheme
  • 54. Maximal Partition Maximal Partition • Given an SO tgd θ • Partition clauses into Π = π1, . . . , πn 1 No πi and πj share any skolems 2 There is no Π with more elements than Π that fulfills condition 1) Theorem: Rewritability and Maximal Partitions Rewritable (θ) ⇔ ∀i : Rewritable (πi ) Slide 8 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Partitioning Scheme
  • 55. Appendix 6 References 7 Additional Notation 8 A Note on Complexity 9 Partitioning Scheme 10 Additional Experiments 11 STBenchmark 2.0 12 Additional Linearization Examples 13 Example Augmentation
  • 56. Real-Life Mappings • Three real-life mapping scenarios from the literature • Created SO tgds based on • Semantics of the schemas • Documented data transformations • Compared all rewriting techniques Slide 9 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Additional Experiments
  • 57. Appendix 6 References 7 Additional Notation 8 A Note on Complexity 9 Partitioning Scheme 10 Additional Experiments 11 STBenchmark 2.0 12 Additional Linearization Examples 13 Example Augmentation
  • 58. Towards STBenchmark 2.0 Noteworthy Features • Support for arbitrary Skolem Functions (SO tgds) and various Skolemization modes (e.g., Key, All and Random) • Simulating some cases of composition using Skolem Noise • Reuse of source schema elements using Source Reuse • PKs and random multi-attribute FDs over the source Usability Case? • Thinking about comparing different notions of mapping inverse • Any suggestions? Slide 10 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: STBenchmark 2.0
  • 59. Appendix 6 References 7 Additional Notation 8 A Note on Complexity 9 Partitioning Scheme 10 Additional Experiments 11 STBenchmark 2.0 12 Additional Linearization Examples 13 Example Augmentation
  • 60. Notation GLAV (s-t tgds): ∀z, x(φ(z, x) → ∃yψ(x, y)) ∀d ∀p ∀b Works(d, p, b) → ∃y1 ∃y2 Project(y1, b, y2) nested GLAV: Q(x, y)((φ1(x) → ψ1(x, y)) ∧ . . . ∧ (φn(x) → ψn(x, y))), where Q(x, y) is a sequence of quantifiers, that is, ∀ for x and ∃ for y ∀d ∃y1 ∀p ∃y2 ∀b Works(d, p, b) → Project(y1, b, y2) SO tgds: ∃f( (∀x1(φ1 → ψ1)) ∧ · · · ∧ (∀xn(φn → ψn)) ) Note: we usually omit universal quantifiers ∃f ∃g(Works(d, p, b) → Project(f (p), b, g(d))) Slide 11 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Additional Linearization Examples
  • 61. Nesting Example: Skolems with Overlapping Arguments Source Schema WorksOn (Department, Project, BudgetId) Audit (BudgetId, Auditor) City (Department, City) Target Schema Project (PId, BudgetId) Dept (Dept, Year, Project, NumEmp) Location (Department, DepId, City, State) Budget (Project, Leader, Size) ∃f ∃g( WorksOn (d, p, b) → Dept (d, f (d), p, g(d, p))) Slide 12 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Additional Linearization Examples
  • 62. Nesting Example: Skolems with Overlapping Arguments Source Schema WorksOn (Department, Project, BudgetId) Audit (BudgetId, Auditor) City (Department, City) Target Schema Project (PId, BudgetId) Dept (Dept, Year, Project, NumEmp) Location (Department, DepId, City, State) Budget (Project, Leader, Size) ∃f ∃g( WorksOn (d, p, b) → Dept (d, f (d), p, g(d, p))) We need to introduce two ∃ quantifiers without violating the dependencies modeled by f and g Slide 12 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Additional Linearization Examples
  • 63. Nesting and Linearization Example: Skolems with Overlapping Arguments Source Schema WorksOn (Department, Project, BudgetId) Audit (BudgetId, Auditor) City (Department, City) Target Schema Project (PId, BudgetId) Dept (Dept, Year, Project, NumEmp) Location (Department, DepId, City, State) Budget (Project, Leader, Size) ∃f ∃g( WorksOn (d, p, b) → Dept (d, f (d), p, g(d, p))) We need to introduce two ∃ quantifiers without violating the dependencies modeled by f and g ∀d∃vf ∀p∃vg ∀b WorksOn (d, p, b) → Dept (d, vf , p, vg ) Slide 12 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Additional Linearization Examples
  • 64. Appendix 6 References 7 Additional Notation 8 A Note on Complexity 9 Partitioning Scheme 10 Additional Experiments 11 STBenchmark 2.0 12 Additional Linearization Examples 13 Example Augmentation
  • 65. Augmentation is better than Minimization Source Schema WorksOn (Department, Project, BudgetId) Audit (BudgetId, Auditor) City (Department, City) Target Schema Project (PId, BudgetId) Dept (Dept, Year, Project, NumEmp) Location (Department, DepId, City, State) Budget (Project, Leader, Size) WorksOn: Department, Project → BudgetId Audit: BudgetId → Auditor θ = ∃f ∃g WorksOn (d, p, b) ∧ Audit (b, a) → Budget (p, f (d, p), g(b, a)) θaug = ∃f ∃g . . . → Budget (p, f (d, p, b, a), g(b, a)) θmin = ∃f ∃g . . . → Budget (p, f (d, p), g(b)) FD1 : d, p → b, a FD2 : b → a Slide 13 of 4 Arocena, Glavic, and Miller - Value Invention in Data Exchange: Example Augmentation