Unleash Your Potential - Namagunga Girls Coding Club
ANSI SQL Transparent Multipath Hierarchical Structured Data Processing for Relational, XML & Legacy Data
1. ANSI SQL Transparent Dynamic
Multipath Hierarchical Structured
Data Processing For
Relational, XML, & Legacy Data
Michael M David and Lee Fesperman
Advanced Data Access Technologies, Inc.
www.adatinc.com
The example results shown in this presentation can be reproduced
on our online interactive prototype at www.adatinc.com/demo.html
2. Previous Database History
Hierarchical processing popular 40 years ago
Used user navigational processing
Multipath hierarchical query processing used
Used nonprocedural navigationless processing
Made possible by automatic semantic processing
Relational processing replaces hierarchical
Also uses nonprocedural navigationless processing
But more flexibility with data independence from join
Hierarchical processing technology forgotten
No Internet to store hierarchical processing knowledge
Hierarchical structures popular again with XML
User procedural navigation is back again
Where is navigationless access for structured access?2
3. SQL/XML Industry Problems
Only proprietary vendor solutions
All vendor solutions incompatible with each other
Integration solutions are XML centric and procedural
No satisfactory XML integration solution
Hierarchical data value loss
XML not fully integrated into SQL
XML structured data processed as semistructured
No hierarchical processing standard
Invalid hierarchical processing result is possible
Invalid hierarchical structure result is possible
Markup and structured data processed the same
Structured data needs to be processed differently!
3
4. SQL/XML Structured Data
Processing Problems
Limited to linear single path processing
Query data selection limitations
Relational join single path mindset
Requires user database navigation
Loss of dynamic hierarchical processing
Limits complex hierarchical processing
Prevents seamless & transparent processing
Requires specific vendor user training
Requires XML and vendor trained users
4
5. A Working ANSI SQL XML Solution
Limits processing to only structured data
Allows unambiguous nonprocedural querying
Enables navigationless schema-free operation
Only performs hierarchical operations
Only uses hierarchical Left Outer Join operation
Naturally supports full hierarchical structures
Allows hierarchical structure-aware operation
Supports inherent hierarchical processing
This solves relational/XML data integration
This also solves SQL to XML system mapping
Transparent multipath hierarchical processing 5
6. Structured Data Automatic
Processing Benefits
SQL transparent XML integration
ANSI SQL-92 syntax and semantics, not XML centric
Nonprocedural and navigationless, Interactive
No knowledge of structure necessary, schema-free
Multipath nonlinear hierarchical processing
Hierarchically accurate results automatically
Dynamic hierarchical processing optimization
Dynamic output uses hierarchical result structure
SQL hierarchical views fully functional
Global views with no overhead and maximum reuse
Global hierarchical queries now possible
Hierarchical data filtering & XML keyword search 6
7. Current XML Hierarchical Data
XML Was Created as a Markup Language
Unstructured Mapped to Semistructured
Structured Vs. Semistructured Data
Same as Fixed Vs. Fuzzy Meaning
Why Semistructured Requires Navigation
Structured Semistructured
Data Data The multiple B
nodes in this
A A structure make
B C B C it ambiguous
for querying
D E B D without user
Unambiguous Ambiguous navigation.
7
Query Structure Query Structure
8. Why Hierarchical Data
structures are Powerful
Automatic Data and Path Reuse
Extending Path Increases Data Value
Adding Paths Increase Data Value Further
Hierarchical XML Becoming Ubiquitous
Data Structures are Unambiguous
Data/Path Creating more
value than is
1) A/B A collected,
2) A/C B C automatic
3) A/C/D nonlinear data
4) A/C/E D E value increase. 8
9. Why Hierarchical Structured
Data Processing is Powerful
Dynamic Multipath Query Combinations
Multipath Data Value Grows Continually
Every Node is Related to Every Other Node
No. of Possible Queries Becomes Unlimited
Automatic Processing = Unlimited Complexity
Increasingly Complex Naturally utilizes the inherent
Multipath Queries hierarchical semantics in
Mview multipath queries.
1) SELECT B,C FROM Mview
A 2) SELECT B,D FROM Mview WHERE A=2
B C 3) SELECT A,B FROM Mview WHERE E=5
4) SELECT B,C FROM Mview WHERE D=1 AND E=5
D E 5) SELECT D,E FROM Mview WHERE B=3 OR C=4 9
10. Hierarchical Data Structure Types
XML is self defining contiguous and nested
No foreign key relation needed to make structure
IBM’s IMS database is discontinuous
Internally linked
Structured VSAM is contiguous
With hierarchical level & data occurrence counts
Flat tabular objects hierarchically modeled
Relational tables, flat files, spread sheets
All support same hierarchical operation and principles
10
11. Hierarchical Structured Data Makeup
Multiple Node Types
Nodes Support Multiple Data Occurrences
Naturally Built With 1 to M Relationships
Hierarchical Data Preservation Operation-
Naturally Support Variable Length Paths
Co C01 Not to be confused with
1
Co2 function driven single node
M external hierarchical data
Emp3 structure processing.
Emp Emp1
1 1 Emp2 Suitable for IMS, Structured
Proj3 VSAM, XML, COBOL FD and
M M
Proj4 hierarchically modeled
Dpnd Proj Dpnd1 Proj2 Relational and flat data. 11
12. SQL Hierarchical Structured Data
Can Be Stored in Relational Rowsets
Multiple Node Types in Relational Rowset
Node Multiple Data Occurrences in Rowset
Multiple Pathways in Rowset
Variable Length Pathways in Rowset
C01 Relational Rowset
Co2 Hierarchical
Co Emp Dpnd Proj structure
Emp3 Co1 Emp1 Dpnd1 preserved
Emp1 in rowset,
Emp2 Co1 Emp2 Proj2
in and back
Proj3
Co2 Emp3 Proj3 out again.
Proj4
Dpnd1 Proj2 Co2 Emp3 Proj4 12
13. Mapping Relational SQL to
Hierarchical XML
Input View Conceptual Hierarchical Result
Query
A A Shown on this slide:
SELECT A.a, D.d, E.e
BB C FROM GlobalView D E 1. Conceptual Level
WHERE B.b=“B1”
D EE XML 2. Multipath Processing
X D
Output 3. Hierarchical Filtering
4. Dynamic Data Select
Working Set Result Set
5. Access Optimization
A B C D E A D E
6. Global View Support
A1 B1 C1 D1 E1 A1 D1 E1
A1 B1 C1 D2 E1 A1 D2 E1
7. Schema-free Query
A1 B1 C1 D1 E2 A1 D1 E2 8. Node Promotion
A1 B1 C1 D2 D2 A1 D2 D2 9. Auto Output Format
A1 B2 C1 D1 E1 13
14. Relational Logical Hierarchical View
CREATE VIEW EmpView AS
SELECT * FROM Emp Emp
Hierarchical SQL
LEFT JOIN Dpnd data modeling and
ON EmpID=DpndEmpID processing Dpnd Eaddr
semantics defined.
AND DpndCode=’D’
LEFT JOIN Eaddr ON EmpCustID=EaddrCustID;
SELECT EmpID, DpndID, <root>
<emp empid="Emp01">
EaddrID <dpnd dpndid="Dpnd01"/>
FROM EmpView <eaddr eaddrid="Addr01"/>
</emp>
Logical hierarchical SQL <emp empid="Emp02">
view and semantics
<eaddr eaddrid="Addr03"/>
processed directly by
relational engine making it
</emp>
operate fully hierarchically. </root>
14
15. XML Physical Hierarchical View
CREATE XML CustView
CREATE VIEW CustView AS
Cust( SELECT * FROM Cust
CustID Char(8), LEFT JOIN Invoice
CustStoreID Char(8)), ON CustID=InvID
Invoice( LEFT JOIN ADDR
Converted to
InvID Char(8),
SQL CustView
ON CustID=AddrID;
InvCustID Char(8),
SELECT Cust, Invoice, Addr
InvStatus Char(8)) Parent Cust,
Addr( FROM CustView
AddrID Char(8), <root>
AddrCustID Char(8), <cust custid="Cust01">
AddrState Char(8)) Parent Cust <invoice invid="Inv01"/>
<invoice invid="Inv02"/>
<addr addrid="Addr01"/>
Cust
This SQL created outer join view syntax is
Invoice Addr used as a hierarchical map of the physical
15
IMS CustView for seamless operation.
16. Joining Heterogeneous Views
SELECT EmpID, DpndID, EaddrID,
CustID InvID, AddrID <root>
FROM EmpView <emp empid="Emp01">
LEFT JOIN CustView <dpnd dpndid="Dpnd01"/>
<eaddr eaddrid="Addr01"/>
ON EmpCustID=CustID
<cust custid="Cust01">
<invoice invid="Inv01"/>
Hierarchical Heterogeneous <invoice invid="Inv02"/>
Structure Join Logical View <addr addrid="Addr01"/>
Emp of Result </cust>
</emp>
Dpnd Eaddr Emp <emp empid="Emp02">
<eaddr eaddrid="Addr03"/>
Cust Dpnd Eaddr Cust <cust custid="Cust03">
<addr addrid="Addr03"/>
Invoice Addr Invoice Addr
</cust>
Emp logical and Cust physical views are </emp>
both hierarchically modeled in the same </root>
way in SQL making for a seamless, 16
unified and logical hierarchical join.
17. Logical Hierarchical Structures Offer
Flexible Relational/XML Integration
Logical Physical Data Model: Logical
Data Type
Tables XML Natural hierarchical
Mapping Common structures
Features: solve
Common Outer Join
Modeling Abstraction hierarchical
Data Modeling
Consistency fixed structure
Log. and Phy. Seamless problems.
Structures have Capabilities:
Emp
same hierarchy Data Integration
Dpnd Eaddr op principles Solves Rel and Hier Problems
Separates Structure from Data
Heterogeneous Hierarchical Structure Flexibility
Cust
Logical Offers Flexibility to Fixed Structures
Invoice Addr Structure
17
18. Heterogeneous Data Structure
Mashup Uses Linking Below Root
SELECT EmpID, DpndID, InvID, <root>
AddrID, EaddrID, CustID <emp empid="Emp01">
<dpnd dpndid="Dpnd01"/>
FROM EmpView LEFT JOIN
<eaddr eaddrid="Addr01">
CustView ON EaddrID=AddrID <cust custid="Cust01">
WHERE CustID <> ”Cust02” <invoice invid="Inv01"/>
<invoice invid="Inv02"/>
Link Below Root Result Structure <addr addrid="Addr01"/>
</cust>
Emp Emp
</eaddr>
Dpnd Eaddr Dpnd Eaddr </emp> ...</root>
Cust Cust Mashups allow linking anywhere
into the lower level structure. We
Invoice Addr Invoice Addr
determined this was valid and what
Dashed arrow is linkage. the new semantic structure is.
Solid arrow is new structure. 18
19. Association Table Use
SELECT EmpID, DpndID, EaddrID, CustID, InvoiceID
AddrID, IntersectData
FROM EmpView
LEFT JOIN AssociationTable On DpndID=DpndX
LEFT JOIN CustView ON CustID=CustX
Result
Emp EmpView Structure Emp Association Table
Dpnd Eaddr Association Dpnd Capabilities added:
Eaddr
Table • External Relationships
DpndX CustX IntersectData • M to M Relationships
IntersectData
• Intersecting Data
• Can be Transparent
Cust CustView Cust • Retains Hier Structure
Invoice Addr Invoice Addr 19
20. Hierarchical Data Filtering, Auto
Output & Structure-free Processing
Input View Conceptual Hierarchical Result
Automatic
Query
A A Structured XML
SELECT A.a, D.d, E.e
Output Formatting
BB C FROM GlobalView D E
WHERE B.b=“B1”
D EE
X D
Hierarchical WHERE clause global data filtering
Cousin nodes like node C above are filtered from B node
This makes this a more internally complex multipath query
Structure-free processing
Navigationless XML access, no need to know structure
Automatic structure-aware output formatting
20
Result structure known from outer join syntax modeling
21. Optimization, Global Views
and Node Promotion
Input View Conceptual Hierarchical Result
This is a conceptual
Query
A A query where the input
SELECT A.a, D.d, E.e structure is not known
BB C FROM GlobalView D E
and the output adapts
WHERE B.b=“B1”
D EE to the dynamic result.
X D
Hierarchical optimization can remove from access -
Unreferenced data nodes not on path to referenced data
Dynamically controlled by SQL’s variable SELECT list
Optimization makes all views global views
Because they have no overhead for unreferenced fields
Node promotion closes around unselected nodes
21
This happens in relational processing too
22. Global Query Uses Global Views
EmpCust Global views allow entire
Global Query
Emp structures to be defined
and queried with no
SELECT * Dpnd Eaddr Cust overhead, more user
FROM EmpCust friendly. Global query
Invoice Addr allows hierarchical
WHERE InvStatus=‘O’
filtering of entire view.
Global Data Filter
<root>
<emp empid="Emp01" empstoreid="Store01" empcustid="Cust01"
empstatus="F">
<dpnd dpndid="Dpnd01" dpndempid="Emp01" dpndcode="D"/>
<eaddr eaddrid="Addr01" eaddrcustid="Cust01" eaddrstate="CA"/>
<cust custid="Cust01" custstoreid="Store01">
<invoice invid="Inv02" invcustid="Cust01" invstatus="O"/>
<addr addrid="Addr01" addrcustid="Cust01" addrstate="CA"/>
</cust>
</emp>
</root> 22
23. Node Promotion and Nested View
CREATE VIEW EmpCust AS
SELECT * Hierarchical views can be nested.
FROM EmpView LEFT JOIN CustView Views can contain views.
ON EmpCustID=CustID
SELECT EmpID, DpndID, <root>
InvID, AddrID <emp empid="Emp01">
FROM EmpCust <dpnd dpndid="Dpnd01"/>
<invoice invid="Inv01"/>
<invoice invid="Inv02"/>
EmpCust Node <addr addrid="Addr01"/>
Emp Promotion </emp>
<emp empid="Emp02">
Dpnd Eaddr Emp
<addr addrid="Addr03"/>
Dpnd Invoice Addr </emp>
Cust </root>
Invoice Addr Not SELECTed 23
24. Node Promotion Override and XML
Format Change <emp>
<empid>Emp01</empid>
SELECT EmpID, DpndID, <dpnd>
<dpndid>Dpnd01</dpndid>
InvID, AddrID </dpnd>
FROM EmpCust <cust>
FOR XML ELEMENT KEEP NODE <invoice>
<invid>Inv01</invid>
EmpCust No Node
</invoice>
<invoice>
Emp Promotion <invid>Inv02</invid>
Emp </invoice>
Dpnd Eaddr
<addr>
Dpnd Cust <addrid>Addr01</addrid>
Cust
</addr>
Invoice Addr Invoice Addr </cust>
</emp>
Without node promotion, unselected <emp>
nodes are output empty. XML output <empid>Emp02</empid>
format was changed to Element style. 24
25. Data Structure Mashup With Node
Promotion = Data Virtualization
SELECT EmpID, DpndID, <root>
InvID, AddrID <emp empid="Emp01">
FROM EmpView <dpnd dpndid="Dpnd01"/>
LEFT JOIN CustView <invoice invid="Inv01"/>
<invoice invid="Inv02"/>
ON EaddrID=AddrID
<addr addrid="Addr01"/>
Link Below Root </emp>
Result Output with
<emp empid="Emp02">
Emp Node Promotion
<addr addrid="Addr03"/>
Dpnd Eaddr Emp </emp>
</root>
Dpnd Invoice Addr
Cust
Invoice Addr Mashups with node promotion gives
aggregated result of the data. This has
Dashed boxes are unselected, not output. same effect as data virtualization. 25
26. Multipath Query Processing and
its Required LCA Processing
Multipath query references multiple pathways-
and uses a WHERE data filtering operation
For example: Selecting data based on data in another path
This requires a special processing using LCA
Lowest Common Ancestor (LCA) Node
The LCA node is the lowest common ancestor node -
Located between two pathway node points in the structure
Used to keep the hierarchical query result meaningful
Two types of SQL multipath LCA usage
SELECT with WHERE clause referencing two different paths
Compound WHERE clause referencing two different paths
This LCA processing solves the XML Keyword Search problem
These two uses of LCA can be combined causing nesting
Each SELECT data item can have its own LCA 26
27. Multipath LCA Query Logic for
Single SELECT Data
SELECT DpndID Lowest Common Ancestor (LCA)
FROM EmpCust processing for the SQL SELECT
WHERE AddrID=’Addr01’ clause will automatically process
multiple LCA nodes when multiple
specified output data is located
EmpCust across different pathways.
Emp SELECT LCA
Dpnd Eaddr Cust <root>
<dpnd dpndid="Dpnd01"/>
Invoice Addr
</root>
SELECT data types Dpnd and
Invoice generate different LCAs.
27
28. Multipath LCA Query Logic for
Multiple SELECT Data
SELECT DpndID, InvID Lowest Common Ancestor (LCA)
FROM EmpCust processing for the SQL SELECT
WHERE AddrID=’Addr01’ clause will automatically process
multiple LCA nodes when multiple
specified output data is located
EmpCust across different pathways.
Emp SELECT LCAs
<root>
Dpnd Eaddr Cust
<dpnd dpndid="Dpnd01"/>
Invoice Addr <invoice invid=“Inv01"/>
<invoice invid=“Inv02"/>
SELECT data types Dpnd and
Invoice generate different LCAs. </root>
28
29. Compound WHERE Clauses also
Require Their Own LCA
Lowest Common Ancestor (LCA) node
SELECT DpndID processing for multipath queries is
FROM EmpCust necessary. We found it working in SQL
WHERE InvID=’Inv01’ naturally on both the SELECT and
WHERE operations. Academic projects
AND AddrID=’Addr01’ are currently researching how to add it to
XQuery. LCA is the lowest node
EmpCust between node points on separate paths.
Emp SELECT LCA
WHERE LCA
<root>
Dpnd Eaddr Cust
<dpnd dpndid="Dpnd01"/>
Invoice Addr <dpnd dpndid="Dpnd03"/>
This LCA example is actually a </root>
combined LCA and nested LCA example 29
30. Data Driven
Variable Structure Generation
SELECT EmpID, DpndID, InvID, <root>
AddrID, EaddrID, <emp empid="Emp01"
CustID, EmpStatus empstatus="F">
FROM EmpView <dpnd dpndid="Dpnd01"/>
LEFT JOIN CustView <cust custid="Cust01">
<invoice invid="Inv01"/>
ON EmpCustID=CustID
<invoice invid="Inv02"/>
AND EmpStatus=“F” <addr addrid="Addr01"/>
</cust>
Join Structure Variable Structure </emp>
<emp empid="Emp02"
Emp Emp empstatus="">
Dpnd Eaddr </emp>
Dpnd
</root>
Cust Cust Current EmpStatus data value
Invoice Addr controls if this CustView data
Invoice Addr
occurrence expands or not. 30
31. Variable Structure Generation
Using Multiple Choice
SELECT EmpID, DpndID, InvID, Variable Structure
AddrID, EaddrID, <root>
CustID, EmpStatus <emp empid="Emp01"
FROM EmpView empstatus="F">
LEFT JOIN CustViewF <dpnd dpndid="Dpnd01"/>
ON EmpCustID=CustID Insert Full Time View Here
AND EmpStatus=“F” </emp>
<emp empid="Emp02"
LEFT JOIN CustViewP
empstatus=“P">
ON EmpCustID=CustID
Insert Part Time View Here
AND EmpStatus=“P”
</emp>
</root>
Emp
Current EmpStatus data value
Dpnd CustViewF CustViewP controls which view expands.
Control value is up or down path. 31
32. Data Structure Transformation
Two Basic Types of Data Structure Transformation
Restructuring: uses data relationships in the data
Restructuring: Uses only the structure semantics in the data
Restructuring
Used to bring out new data relationships
Also removes data and nodes
New structure is secondary, driven new relationships desired
Reshaping
Used to reshape data structure to a desired new structure
No data relationships dependency
Any-to-any data structure transformation is possible
Reshaping is driven by structures natural semantics
SQL transformation operation assures accuracy
Restructuring and Reshaping operations can be combined 32
33. Restructure Transformation
SELECT X.EmpID, X.DpndID, Y.InvID, X.AddrID
FROM EmpCust Y
LEFT JOIN EmpCust X ON Y.InvCustID=X.EmpCustID
<root>
<invoice invid="Inv01"/>
EmpCust X.Emp <emp empid="Emp01">
<dpnd dpndid="Dpnd01"/>
X.Dpnd Eaddr X.Cust <addr addrid="Addr01"/>
</emp>
Y.Invoice Addr
</Invoice> ....</root>
Result Invoice Restructuring SQL transformation:
1) Takes structure apart
Emp 2) Disregards unwanted portions
3) Recombines it differently --
Dpnd Addr 4) Uses different data relationships
5) This introduces new semantics 33
34. Semantic Any-to-Any Structure
Reshaping Transformation
SELECT X.DpndID, Y.EmpID, Y.EaddrID
FROM EmpView X
<root>
LEFT JOIN EmpView Y <dpnd dpndid="Dpnd01">
ON X.DpndID=Y.DpndID <emp empid="Emp01">
<eaddr eaddrid="Addr01"/>
Result </emp>
EmpView Structure </dpnd>
X Emp Dpnd </root>
Dpnd Eaddr
Emp Reshaping uses only the semantics of the
data structure to transform it into the desired
Y Emp Eaddr structure. No new data relationships are
needed so this method can always be used.
Dpnd Eaddr
Linking below the root is utilized. 34
35. Polymorphic Any-to-Any Reshaping
SELECT X.DpndID, Y.EaddrID, Z.EmpID
FROM EmpView X
LEFT JOIN EmpView Y ON X.DpndID=Y.DpndID
LEFT JOIN EmpView Z ON Y.EaddrID=Z.EaddrID
<root>
EmpView Target
<dpnd dpndid="Dpnd01">
Structure
Emp <eaddr eaddrid="Addr01">
X Dpnd
<emp empid="Emp01"/>
Dpnd Eaddr </eaddr>
Eaddr
</dpnd>
Y Emp Emp </root>
Dpnd Eaddr
This reshaping example moves only one node at a time
which makes its operation polymorphic allowing any
Z Emp shaped input structure that uses the same data names.
Reshaping uses comparing the structure to itself for
Dpnd Eaddr 35
positioning since data relationships are not relied on.
36. Reshaping to Multipath Structure
SELECT X.DpndID, Y.EaddrID, Z.EmpID
FROM EmpView X
LEFT JOIN EmpView Y ON X.DpndID=Y.DpndID
LEFT JOIN EmpView Z ON X.DpndID=Z.DpndID
EmpView Target
<root>
Structure
Emp
<dpnd dpndid="Dpnd01">
X Dpnd
<eaddr eaddrid="Addr01"/>
Dpnd Eaddr <emp empid="Emp01"/>
Eaddr Emp
</dpnd>
Emp Y </root>
Dpnd Eaddr
This reshaping example demonstrates reconstructing
Emp Z to a multipath nonlinear structure using SQL’s
hierarchical data modeling. The SQL hierarchical
Dpnd Eaddr semantic operation helps preserve correctness. 36
37. Multipath Querying Vs. Transform
Multipath Multipath Transformed Transformed
ViewX ViewX Data ViewX ViewX Data
B 12 16
A 10
A 10 10
B C 12 18
16 C 18 18
SELECT A, C SELECT A, C A simple
FROM ViewX FROM ViewX multipath
WHERE B>10 WHERE B>10 query can
Output Data usually avoid
Structure Replication Specifically transforms
Multiple use
and preserve
multipath transforms
10 A 10 10 structure with
schema-free structure A/B to
more accurate
query uses & B/A changing
18 C 18 18 results.
preserves semantics from
data structure 1-to-M to M-to-1 37
38. Renaming, Replicating, & Splitting
SELECT EmpID EmpName, DpndID AS DpndName, Nodes
Addr1.EaddrID AS AddrName,
Addr1.EaddrID AS AddrName,
Addr2.Eaddrstate AS AddrState
FROM Emp AS Employee
LEFT JOIN Dpnd AS Dependent ON EmpID=DpndEmpID and DpndCode='D'
LEFT JOIN EAddr AS Addr1 ON EmpCustID=Addr1.EAddrCustID
LEFT JOIN EAddr AS Addr2 ON Addr1.EaddrCustID=Addr2.EAddrCustID
<root>
<employee empname="Emp01">
Dpnd Emp Eaddr
<dependent dpndname="Dpnd01"/>
Result <addr1 addrname="Addr01">
<addr2 addrstate="CA"/>
Employee
</addr1>
</employee>
Dependent Addr1
<employee empname="Emp02">
Addr2 <addr1 addrname="Addr03">
<addr2 addrstate="NV"/>
The AS keyword renames XML data </addr1>
items and nodes, it can be used to </employee> 38
replicate nodes to help split them. </root>
39. XML Duplicate and Shared
Unambiguous Node Processing
This same SQL
SELECT * FROM Dept handles both
LEFT JOIN Cust ON DeptID=CustDeptID duplicate and
shared data.
LEFT JOIN Emp ON DeptID=EmpDeptID
LEFT JOIN Addr AS AddrC ON CustID=AddrCustID
LEFT JOIN Addr AS AddrE ON EmpID=AddrEmpID
Ambiguous Unambiguous Ambiguous
Shared Structure Hierarchical Structure Duplicate Element Type
Dept Dept Dept
Cust Emp Cust Emp Cust Emp
39
Addr AddrC AddrE Addr Addr
40. Nonlinear Hierarchical ORDER BY
SELECT CustID, InvID, AddrID <root>
FROM CustView <cust custid="Cust03">
ORDER BY AddrID Desc, <addr addrid="Addr03"/>
</cust>
InvID Desc,
<cust custid="Cust02">
CustID Desc
<invoice invid="Inv03"/>
<addr addrid="Addr04"/>
CustView Ordering <addr addrid="Addr02"/>
Cust </cust>
<cust custid="Cust01">
Invoice Addr
Ordering Inv before Cust is Trouble: Hierarchical ops like ORDER BY require
special nonlinear hierarchical processing
Cust1 Cust2 Cust1 to make sense for SQL Hierarchical
processing. With Order By, each path is
Inv1 Inv2 Inv3 ordered independently as in XML above.
40
Cust becomes out-of-order
41. Hier & XML Middleware Enabler
Before Query:
Query SQL Pre Processing
Establish Views
User’s SQL
Preload XML (ETL)
Real-time Processor SQL Pre Processing:
Hier Access Operating Determines structure
Hierarchically Optimizes structure
Rewrite & submit SQL
SQL Post Processing Real-time Access:
Structure-aware
This Unique Implementation: Retrieve XML (EII)
• Supports XML enables hier processing Return XML as rowset
• Uses in place SQL processor Retain XML data order
• Needs no restaging of data
SQL Post processing:
• Not XML centric, no learning curve
• ANSI SQL-92, vendor neutral Remove replicated data
• Operates seamlessly & transparently Nonlinear data ordering
• Protects current SQL investment Rowset to XML 41
42. Dynamic Data Structure Creation Recap
SELECTed Data Output
Controls processing and output result structure
With automatic node promotion and collection
Combining Data Structures
Joining and mashup of heterogeneous structures
Mashup & node promotion = data virtualization
Generating Variable Data Structures
Multiple choice of view structure generation
Driven by data value further up or down the path
Transforming Data Structures
Restructuring using data relationships
Reshaping using structure semantics
Node splitting and renaming
42
43. The Power of Hierarchical
LEFT Outer Join Syntax Recap
ViewA LEFT JOIN ViewX
ON C.c=Z.c AND A.a=10 Shown on this slide:
Views Expanded • Automatically Combines and
Unifies Heterogeneous Structures
A LEFT JOIN B ON A.a=B.a
LEFT JOIN C ON A.a=C.a • Path Filtering Based Up/Down Path
LEFT JOIN
X LEFT JOIN Y ON X.x=Y.y • Directly Executable by SQL Engine
LEFT JOIN Z ON X.x=Z.z
ON C.c=Z.c AND A.a=10 • Flexible Recombinant Properties
A A • Referencing Below Root is valid
Outer join
models join
B C B C of views,
• Access optimization: paths sliced out
semantics
X X • Processing optimization: reorganized
define result
structure. • Naturally Hierarchically Distributable
43
Y Z Y Z
44. Combining Basic Capabilities
ViewR
Hierarchical Examples:
A XML Result Hier View Support
B C Data Modeling
X D E Hier. Preservation
A1
Var. Length Paths
CREATE View ViewR AS
B1 D1 E1 Multiple Paths
B2 E2
SELECT * FROM A Multi-node Types
LEFT JOIN B ON A.a=B.a Dyn. Data Select
LEFT JOIN C ON A.a=C.a A1B1D1E1 Hier. Data Filtering
LEFT JOIN D ON C.c=D.c A1B2 E2 Node Promotion
LEFT JOIN E ON C.c=E.c
Node Collection
Optimized Access
A1B1C2D1E1 SELECT A.a,B.b,D.d,E.e
FROM ViewR
Global View
A1 C1D1
A1B2C2 E2 WHERE C=‘C2’ Schema-free 44
45. Combining Advanced Capabilities
ViewR ViewX Data Mashup Examples:
CREATE XML
A X ViewX AS A XML View
B C Y Z X(XID Char(8)),
C Heterogeneous
Y(YID Char(8), Data Integration
YFK Char(8)) X Mashup
CREATE View Parent X,
ViewR AS Z(ZID Char(8), Y Z Var. Structures
SELECT * FROM A ZFK Char(8)) LCA Processing
LEFT JOIN B Parent X
ON A.a=B.a A Linear Filtering
LEFT JOIN C Converted to Hier. Ordering
ON A.a=C.a B C
Outer Join Preserve Nodes
X Unified View
SELECT A.a, C.c, Y.y, Z.z XML Keyword
FROM ViewR LEFT JOIN ViewX Y Z Search
ON C.c=Z.c AND A=‘A1’
WHERE Y=‘Y1’ OR Z=‘Z2’ A1C2X1 Y1 Z1
Hierarchical
ORDER BY A.a, Y.y, Z.z A2C2X2 Y2 Z2 XML Result
FOR XML KEEP NODE A3C3X3 Y3 Z3
45
46. Advanced Capabilities 1
Multipath nonlinear processing
Dynamic increase of data value using structure semantics
Processes all queries regardless of internal complexity
Hierarchical data filtering-- XML Keyword Search
Multipath hierarchical data structure joins
Performed by simple join of hierarchical structure views
Can be performed interactively and heterogeneously
Dynamically combines hierarchical structure semantics
Linking below root allows structure mashup
Enables capability to mashup structures meaningfully
Includes powerful look-back and look-ahead capability
Mashup + node promotion = data virtualization
Enables SQL Transformations 46
47. Advanced Capabilities 2
Dynamic automatic variable structure generation
Dynamically builds structures based on values in data
Can utilize cascading and embedded view operation
Operates across multiple paths independently
Dynamic user control of structure
Dynamic SELECT list controls processing structure
Dynamic combining views builds data structure
Transformations Change Structure
Dynamic Structured Output Formatting
Structure-aware processing knows active structure
Uses structure of result to format output data
Active structure is dynamically controlled by user
47
48. Advanced Capabilities 3
Polymorphic any-to-any structure transform
Uses data relationships or just structure semantics
Utilizes SQL and hierarchical rowset data
Two types of transforms, Restructure and Reshaping
Multipath nonlinear hierarchical Operations
Single Order By orders multiple pathways separately
Aggregation operates on separate pathways
Multipath internal LCA nonlinear processing
LCA processing limits the domain across pathways
Is automatic in ANSI SQL, but not in XQuery
Responsible for keeping multipath result meaningful
Untested natural and automatic operations
Natural hierarchical distributed processing
Automatic parallel processing is possible
Semantic web RDF to SQL, SQLfX driven by RDF 48
49. Advanced Capabilities Summary
1) ANSI SQL standard and mathematically sound
2) Ease of use (nonprocedural, navigationless, schema-free)
3) Hierarchically correct (principled multipath processing)
4) Greater efficiency (hierarchical processing optimization)
5) Fully interactive (dynamically process native XML)
6) Conceptual hierarchical processing on full structures
7) Queries can operate across the entire hierarchical structure
8) Nonlinear multipath LCA hierarchical processing
9) Full nonlinear hierarchical data structure mashups
A) Variable data generated structure control
B) Any-to-any polymorphic structure transformations
C) All operations are semantically controlled and accurate
D) Data virtualization supported
E) Natural distributed hierarchical processing
F) Automatic parallel processing is possible
All of the capabilities shown in this presentation can be reproduced 49
on
our online interactive demo at www.adatinc.com/demo.html