7. Customer Case
Initial State
No performance
12.000 “Cases” / night (4 Hour Window)
4 hours are not enough anymore
The “XML” part “looks like it takes too long”
Original database system version 8.1.X
Future Wishes
The need to be able to handle 120.000 “Cases” / night
In the near future hardware/OS from OpenVMS to
HPUX
7
8. An overview
Memory
/ DOM
Oracle
BLOB CLOB XMLType
Advanced Queue
Validation
Process Shred Elements Store in Oracle
XML Schema
Checks Via XMLDOM ETL Tables Workflow
(JAVA)
Memor
y
/ DOM
8
13. Feeding data to the database
Memor
y
/ DOM
Oracle
BLOB CLOB XMLType
Advanced Queue
Why BLOB ? XML data & PDF data
Why CLOB ? Conversion needed for XML handling
Why XMLType Needed to check XML element
content
XML Validation (well-formedness)
13
14. Impedance Mismatch
Different data models.
XPath models an XML document as
a tree while most general purpose
programming languages
have no native data types for a tree.
Different programming paradigms.
XSLT is a functional language, while Java
is object-oriented and Perl is a procedural one.
Effect/Costs
Unnecessary CPU and Memory Overhead
A lot of expensive type and encoding conversions
14
15. The General Rule !
If you deal with XML Handle it via
XML(DB)
So if it is relational, do it the relational way…
If XML use XQuery, or others like XPath etc…
If you mix worlds be careful regarding
Information loss (PK/FK XML) ?
Whitespace NULL Whitespace ?
Impedance mismatch
15
18. Validation on content and structure
Memor
y
/ DOM
Validation
Process Shred Elements
XMLType XML Schema
Checks via XMLDOM
( JAVA based)
XML Schema Validation on XML structure
PL/SQL Wrapper with JAVA XML
Parser
18
20. XML Parsers
Often DOM or Infoset based
CPU intensive
Memory intensive
Parsing, serializing or tree traversals, happen in
memory
Often handle XML tree traversals only via ONE method
It is not structured, semi-structured or unstructured
XML content aware
It is not very “smart” / “content aware” regarding XML
handling based on its XML tree’s and/or XML data content
20
21. XML Schema Registration Advantages
XML Schema will be parsed only once
XML Schema will be cached in memory
No additional parsing
No additional validation
XML Document structure is known, therefore:
No parsing is needed when loaded from disk into
memory
XML Object (XOB) structures can be applied
Memory footprint is much less compared to DOM structure
Needed specific nodes can now be handled efficiently in
memory
21
22. XML Schema based - Query Rewrite
CHAR
String
bookstore
VARCHAR
String
2
(20)
book whitepaper
title author author chapter title author id paragraph
content NUMBE
Float
R
content
(15)
CLOB
22
23. XMLType – Not just a “Datatype”
Checked on
XML Well-Formedness
One root element
Begin & End tags
If XML Schema reference
XOB methods will be used if an
XML Schema is available
DOM methods will be used if an
XML Schema information is
not available
23
25. Keep XML small !
Do not use / enforce Pretty Print if not needed
Avoid namespace reference “Overkill”
Most used Namespace is Leading
Use short Namespace References
Make XML data as “sparse” as possible
<employee><name>Marco</name></employee>
<employee name=“Marco”/>
XML Data Partitioning Y
Binary XML if possible X
25
26. XML Design
Avoid Cyclic References in XML Schemata
For ease of Maintenance: xdb:annotations
Is DOM validation, fidelity needed ?
CPU: XML parsing- XML Schema validation “overhead”
?
Index maintenance overhead, if implemented via disk
Y
X
26
30. Think in “3D” or in “Driving Table” terms
maxoccurs=“unbounded”
Give me the <title> and <content> where <content> contains…
3
1
4
2 5
X
Y
6
Z
x n rows
30
34. Increasing volume – XMLType CLOB
Effect of //
In memory
10.000 Cases:
ORA-31186
Document contains too
many nodes
maxoccurs=unbounded
maxLength, totalDigits, etc
ORA-31186: Document contains too many nodes
Cause: Unable to load the document because it has exceeded
the maximum allocated number of DOM nodes.
Action: Reduces the size of the document
34
36. A Solution based on XMLType O.R.
Oracle
BLOB CLOB
Advanced Queue
Validation
XMLType Table Store in Oracle
Against Checks
(O.R) ETL Tables Workflow
XML Schema
Rewrite on Disk
/ XOB
(Relational)
36
37. Driving Access on CONTENT (…on disk…)
BTree
BTre
BTre
Index
ee
Index
Index
bookstore
Function
based Index
(XPath)
book whitepaper
title author author chapter title author id paragraph
(Un)-Structured
XMLIndex
content structured
content
BTree
Secondary Index
Oracle Text
Index
37
38. Cost Based Optimizer Advantages
Can be influenced via
Statistics
Indexes
XML Schema Registration (XOB)
Encoding in Binary XML storage
SQL Re-Write of XPath, XQuery
Partitioning
38
40. So why can DISK out perform MEMORY
XML Schema validation based on Registered XML
Schema
Query re-write possible
Based on plain “old” SQL/database methods
Optimized CPU handling
Optimized Memory handling (if needed)
Multiple optimized solutions possible via Optimizer
instead of one XML parser method
Specific parts of XML can be handled / be driven via:
specific indexing
or content
Full blown validation can be avoided
40
42. Be aware of what you are doing !
Avoid unneeded (full) XML Schema validation
During Insert
Generating XML
Avoid Impedance mismatch
Java XML Java XML Relational XML Java
“All In One Go Objective”
Avoid intermediate XML fragments
// Y
XMLEXISTS X
Use Indexes
xdb:MaintainDOM=false
42
43. XML Data Handling and Design
Handle XML Smart
Keep XML Small
Restrict XML where possible
Be precise !
maxoccurs, maxLength
Provide Oracle of extra / precise information (XSD)
Register XML Schema
Y
If possible…
X
43
44. Balanced Design
Inserts, Updates & Deletes
In
XML Future Changes Memory
On Disk
Index Maintenance
Selects
In Memory
Via Indexes
XML Validation
Strict, Lazy
Client Side Possibilities
44
45. Now you why DISK can be faster than MEMORY
100.000 “Cases” shredded & validated in 5 minutes
Instead of 1000 “Cases” in 3 minutes…
Avoiding
ORA-31186: Document contains too many nodes
Scalable
Efficient with Memory and CPU
Checked in production on a 9.2.0.5.0 database version
Extra:
…decreased used PL/SQL code by half…
…but will have to KNOW what you are doing…
45
47. References
XMLDB Developers Guide
http://www.oracle.com/pls/db112/homepage
The XMLDB Forum
http://forums.oracle.com/forums/forum.jspa?forumID=34
XML DB FAQ Thread
http://forums.oracle.com/forums/thread.jspa?threadID=410714
Blog
http://www.xmldb.nl
47