SlideShare uma empresa Scribd logo
1 de 90
Creating Structure in
                 Unstructured Data
                What is possible, today…?



Marco Gralike
“Big Data” = XML ?
Challenges are!
Ahum, the problems are!
WikiPedia
• One string of XML data with
  structured and unstructured
  data sections
• Language: English
• Size      : 42,15 GB
• Pages     : 12.961.997
• Date      : 21 Dec 2012
Adventures into
the unknown…?
Setup
• VirtualBox VM
  – OEL 5U8 (64)
  – 8 GB RAM
• LaCie Little Big Disk
  – RAID 0
  – Thunderbolt
• Database
  – SGA    4GB
  – PGA    2GB
My new LaCie LBD is really fast - 
Defeat?! - 1.000.000 pages only
Status of Technology used
XML - Where are we…?




Gartner
Achieved…?
On the Horizon!
• JSoniq
• Zorba
Building (streaming) Bridges
Oracle XML DB
      • NO cost option
      • C (native / embedded kernel)
      • (XQuery) Standards
      • Code maintained by Oracle
XQuery

                                           XMLType Abstraction
                               DB XQuery                                                 Procedural XQuery

                     XQuery Rewrite                         Pushdown                XVM
                                                                           (use “no query rewrite”)


                                  Relational        Streaming XPath                             DOM Tree
                                                       Evaluation                                Model
                                   Access
       SQL Execution              Methods                                   XMLIndex




            Object-Relational                                             Binary XML


           Relational Storage                                            Secure Files

Source: S317428: Building Really Scalable XML Applications with Oracle XML DB and Oracle Text
So about what are we talking ?
WikiPedia
• Structured & Unstructured
  bits and pieces
• A lot of “unbounded”
  elements
• Not a lot of restrictions
• The bit with value is in
  element “tekst”
How do we get this Structured?
Strings = small & defined (12c?)

   Ename  pointer += 100;
<string1/><string2/><string3/>
Flexible, Humans
No Design Patterns
<small/><verybigggr/><bigger/>
<verybigggr>
       <empno>1</empno><ename>Marco</ename>
       <empno>2</empno>
</verybigggr>




 <small/><verybigggr/><bigger/>
We need options!
“XMLType” Container

  In Memory            CLOB
  (document)        (document)

Object Relational   Binary XML
     (data)            (data)
XMLType
      In Memory
      (document)


XOB          XML Schema
XMLType
   Binary XML Securefile
    (document/content)


Post Parse        LOB Index
XMLType
        Object Relational
           (content)


Fully Shredded        Indexes
Something else to Realize !
“What is the fastest way to get this
    stuff in the database…?”
“…it depends…”
“So what is the fastest way to get
    XML in the database…?”
“…it depends…”
“So what is the fastest way to get XML
           in the database…
    … and   useful in my case…?”
Garbage IN – Garbage OUT
WikiPedia
•   SQL*Loader
•   Parallel or Direct
•   Securefile LOB Column
•   2.5 hours

And no (performant) way
to get the details out…
a.k.a “completely useless”
WikiPedia
•   SQL*Loader
•   Parallel or Direct
•   Securefile Binary XML
•   …2.5 hours ???
XML Parsing




• SAX   - Simple API for XML
• DOM   - Document Object Module
fast

insert performance   CLOB



                               XMLType
                                CLOB

                       (domain) indexes

                                           XMLType
                                          Binary XML



                                                         XMLType
                                                       Object Relational




                                                                           fast
                             select performance
XML Partitioning
• Object Relational Partitioning
  – Equi-Partitioning since version Oracle 11.1.0.7.0
• Binary XML Partitioning
  – Range, List, Hash
• Local partitioned XMLIndex
  – LOCAL keyword in XMLIndex create syntax
• Partition Key on virtual column (Binary XML)
• Partition Key on column (Object Relational)
XMLType
   Binary XML Securefile
    (document/content)


Post Parse        LOB Index
Driving access on CONTENT
                                                   BTre
                                                    e
                                                  Index
                           bookstore
                                                                          Function
                                                                         based Index
                                                                           (XPath)
        book                                    whitepaper

title   author   author chapter         title     author          id     paragraph
            Unstructured
                                                          Structured XMLIndex
             XMLIndex
                            content                                       structured
                                                                           content
                                                          BTree
                           Oracle XML                     Index
                           Text Index
Structured Data
Structured XMLIndex (SXI)
• CONTENT TABLE(s)
• Based on XMLTABLE syntax        Structured
                                  XMLIndex
• XMLTable construct can be          f (x)

  nested:
  – VIRTUAL column alias
• Can be maintained manually
• Secondary indexes possible
                                   Content
                                   Tables
Describe CONTENT TABLE




• A “regular” heap table with columns…
• Ideal for secondary indexes, if needed.
CONTENT TABLE(s)

 Structured
 XMLIndex
    f (x)




  Content
  Tables
Semi-Structured Data
Unstructured XMLIndex (UXI)
• PATH TABLE
• Use Path Subsetting                 Unstructured
   – Full Blown XMLIndex can be BIG    XMLIndex
                                          f (x)
• Token Tables (XDB.X$......)
   – Query re-write on Tokens
   – Fuzzy Searches, //
   – Optimizer Statistics
• Can be maintained manually
   – Recorded in Pending Table
                                        Path Table
• Secondary indexes possible
Describe PATH TABLE
What’s hidden…
PATH TABLE

Unstructured
 XMLIndex
    f (x)




 Path Table
Binary XML – No Index
Binary XML + XMLIndex (SXI)
Binary XML + XMLIndex + Sec.Ind.
Binary XML + XMLIndex + Sec.Ind.
Un-Structured Data
XML Full Tekst Index
• Based on Oracle Text Index, XQuery Full Text
• XML Namespace Aware
• XML Semantic aware full text search
  – Full-Tekst Selection Expression – contains text
  – Logical Full Text Operator – ftor, ftand, ftMildNot
  – Context Aware full text search
Balanced Design
• Inserts, Updates & Deletes
  – XML Future Changes
  – Index Maintenance           In Memory   On Disk

• Selects
  – In Memory
  – Via Indexes
• XML Validation
  – Strict, Lazy
  – Client Side Possibilities
Reward
• Optimal performance
• Out performing XML
• Proper design will give
  performance increase over
  XML handling…


…proper design is still key…
References
Oracle XML DB
  – http://www.oracle.com/pls/db112/homepage
XML DB FAQ Thread
  – http://forums.oracle.com/forums/thread.jspa?thr
    eadID=410714
Personal Blog
  – http://www.xmldb.nl
  – http://technology.amis.nl
References
Daniela Florescu, Oracle Corporation
  Advances in XML and XQuery
Sam Idicula, Oracle XML DB Development Team
  Binary XML Storage and Query Processing in Oracle
Jinyu Wang, Scott Brewton
  Making XML Technology Easier to Use
Joel Spolsky - Joel on Software
  Back to Basics
References
Oracle XML DB Main page material
• Oracle XML DB : Best Practices to Get Optimal
  Performance out of XML Queries (PDF)
• Oracle XML DB : Choosing the Best XMLType
  Storage Option for Your Use Case (PDF)
• A Request for Comments for the Oracle Binary
  XML Format

Mais conteúdo relacionado

Mais procurados

XFILES, The APEX 4 version - The truth is in there
XFILES, The APEX 4 version - The truth is in thereXFILES, The APEX 4 version - The truth is in there
XFILES, The APEX 4 version - The truth is in thereMarco Gralike
 
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...Marco Gralike
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1Marco Gralike
 
Design Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will PerformDesign Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will PerformMarco Gralike
 
Oracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesOracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesMarco Gralike
 
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
OakTable World 2015  - Using XMLType content with the Oracle In-Memory Column...OakTable World 2015  - Using XMLType content with the Oracle In-Memory Column...
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...Marco Gralike
 
XML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBXML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBMarco Gralike
 
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesUKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesMarco Gralike
 
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...Marco Gralike
 
Ordina Oracle Open World
Ordina Oracle Open WorldOrdina Oracle Open World
Ordina Oracle Open WorldMarco Gralike
 
Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Marco Gralike
 
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...Marco Gralike
 
UKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the DatabaseUKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the DatabaseMarco Gralike
 
Jdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And EnhancementsJdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And Enhancementsscacharya
 
Database Programming
Database ProgrammingDatabase Programming
Database ProgrammingHenry Osborne
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Scott Leberknight
 

Mais procurados (20)

XFILES, The APEX 4 version - The truth is in there
XFILES, The APEX 4 version - The truth is in thereXFILES, The APEX 4 version - The truth is in there
XFILES, The APEX 4 version - The truth is in there
 
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
 
Design Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will PerformDesign Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will Perform
 
Oracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesOracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New Features
 
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
OakTable World 2015  - Using XMLType content with the Oracle In-Memory Column...OakTable World 2015  - Using XMLType content with the Oracle In-Memory Column...
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
 
XML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBXML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDB
 
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesUKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
 
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
 
Ordina Oracle Open World
Ordina Oracle Open WorldOrdina Oracle Open World
Ordina Oracle Open World
 
Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2
 
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
 
UKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the DatabaseUKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the Database
 
Jdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And EnhancementsJdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And Enhancements
 
Xml parsers
Xml parsersXml parsers
Xml parsers
 
Xml processors
Xml processorsXml processors
Xml processors
 
Database Programming
Database ProgrammingDatabase Programming
Database Programming
 
Java XML Parsing
Java XML ParsingJava XML Parsing
Java XML Parsing
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Spring data jpa
Spring data jpaSpring data jpa
Spring data jpa
 

Destaque

Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Peter Wren-Hilton
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016George Roth
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityGreat Wide Open
 
Lecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data WarehouseLecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data Warehousephanleson
 
The Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataThe Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataHealth Catalyst
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BIMonaheng Diaho
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarDatameer
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataSeth Grimes
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementDataWorks Summit
 

Destaque (9)

Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to Infinity
 
Lecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data WarehouseLecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data Warehouse
 
The Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataThe Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the Data
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BI
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop Webinar
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data Management
 

Semelhante a Hotsos 2013 - Creating Structure in Unstructured Data

Expertezed 2012 Webcast - XML DB Use Cases
Expertezed 2012 Webcast - XML DB Use CasesExpertezed 2012 Webcast - XML DB Use Cases
Expertezed 2012 Webcast - XML DB Use CasesMarco Gralike
 
SQLPASS AD501-M XQuery MRys
SQLPASS AD501-M XQuery MRysSQLPASS AD501-M XQuery MRys
SQLPASS AD501-M XQuery MRysMichael Rys
 
Making your data work harder than you do
Making your data work harder than you doMaking your data work harder than you do
Making your data work harder than you doSusan Jane Williams
 
Extbase object to xml mapping
Extbase object to xml mappingExtbase object to xml mapping
Extbase object to xml mappingThomas Maroschik
 
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...InSync2011
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyRobert Viseur
 
Easy Data Object Relational Mapping Tool
Easy Data Object Relational Mapping ToolEasy Data Object Relational Mapping Tool
Easy Data Object Relational Mapping ToolHasitha Guruge
 
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...Dr.-Ing. Thomas Hartmann
 
XML-Extensible Markup Language
XML-Extensible Markup Language XML-Extensible Markup Language
XML-Extensible Markup Language Ann Joseph
 
Tech 802: Data, Databases & XML
Tech 802: Data, Databases & XMLTech 802: Data, Databases & XML
Tech 802: Data, Databases & XMLsomisguided
 
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...Dr.-Ing. Thomas Hartmann
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopDmitry Kan
 

Semelhante a Hotsos 2013 - Creating Structure in Unstructured Data (20)

Expertezed 2012 Webcast - XML DB Use Cases
Expertezed 2012 Webcast - XML DB Use CasesExpertezed 2012 Webcast - XML DB Use Cases
Expertezed 2012 Webcast - XML DB Use Cases
 
Xml databases
Xml databasesXml databases
Xml databases
 
SQLPASS AD501-M XQuery MRys
SQLPASS AD501-M XQuery MRysSQLPASS AD501-M XQuery MRys
SQLPASS AD501-M XQuery MRys
 
Catmandu / LibreCat Project
Catmandu / LibreCat ProjectCatmandu / LibreCat Project
Catmandu / LibreCat Project
 
XML Technologies
XML TechnologiesXML Technologies
XML Technologies
 
Agile xml
Agile xmlAgile xml
Agile xml
 
Making your data work harder than you do
Making your data work harder than you doMaking your data work harder than you do
Making your data work harder than you do
 
Extbase object to xml mapping
Extbase object to xml mappingExtbase object to xml mapping
Extbase object to xml mapping
 
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
Easy Data Object Relational Mapping Tool
Easy Data Object Relational Mapping ToolEasy Data Object Relational Mapping Tool
Easy Data Object Relational Mapping Tool
 
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
 
XML-Extensible Markup Language
XML-Extensible Markup Language XML-Extensible Markup Language
XML-Extensible Markup Language
 
XML
XMLXML
XML
 
Tech 802: Data, Databases & XML
Tech 802: Data, Databases & XMLTech 802: Data, Databases & XML
Tech 802: Data, Databases & XML
 
XML
XMLXML
XML
 
XMl
XMlXMl
XMl
 
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
 
Unit iv xml dom
Unit iv xml domUnit iv xml dom
Unit iv xml dom
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
 

Mais de Marco Gralike

UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxMarco Gralike
 
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudeProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudMarco Gralike
 
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 DatabaseeProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 DatabaseMarco Gralike
 
Oracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory DatabaseOracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory DatabaseMarco Gralike
 
UKOUG Tech15 - Going Full Circle - Building a native JSON Database API
UKOUG Tech15 - Going Full Circle - Building a native JSON Database APIUKOUG Tech15 - Going Full Circle - Building a native JSON Database API
UKOUG Tech15 - Going Full Circle - Building a native JSON Database APIMarco Gralike
 
An introduction into Oracle VM V3.x
An introduction into Oracle VM V3.xAn introduction into Oracle VM V3.x
An introduction into Oracle VM V3.xMarco Gralike
 
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3Marco Gralike
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)Marco Gralike
 
Flexibiliteit & Snel Schakelen
Flexibiliteit & Snel SchakelenFlexibiliteit & Snel Schakelen
Flexibiliteit & Snel SchakelenMarco Gralike
 
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file serverBGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file serverMarco Gralike
 

Mais de Marco Gralike (11)

UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
 
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudeProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
 
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 DatabaseeProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
 
Oracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory DatabaseOracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory Database
 
UKOUG Tech15 - Going Full Circle - Building a native JSON Database API
UKOUG Tech15 - Going Full Circle - Building a native JSON Database APIUKOUG Tech15 - Going Full Circle - Building a native JSON Database API
UKOUG Tech15 - Going Full Circle - Building a native JSON Database API
 
An introduction into Oracle VM V3.x
An introduction into Oracle VM V3.xAn introduction into Oracle VM V3.x
An introduction into Oracle VM V3.x
 
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)
 
Flexibiliteit & Snel Schakelen
Flexibiliteit & Snel SchakelenFlexibiliteit & Snel Schakelen
Flexibiliteit & Snel Schakelen
 
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file serverBGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
 
Amis ACE
Amis ACEAmis ACE
Amis ACE
 

Último

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 

Último (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 

Hotsos 2013 - Creating Structure in Unstructured Data

  • 1. Creating Structure in Unstructured Data What is possible, today…? Marco Gralike
  • 2.
  • 3.
  • 4.
  • 7. WikiPedia • One string of XML data with structured and unstructured data sections • Language: English • Size : 42,15 GB • Pages : 12.961.997 • Date : 21 Dec 2012
  • 9. Setup • VirtualBox VM – OEL 5U8 (64) – 8 GB RAM • LaCie Little Big Disk – RAID 0 – Thunderbolt • Database – SGA 4GB – PGA 2GB
  • 10. My new LaCie LBD is really fast - 
  • 11. Defeat?! - 1.000.000 pages only
  • 13. XML - Where are we…? Gartner
  • 15. On the Horizon! • JSoniq • Zorba
  • 17. Oracle XML DB • NO cost option • C (native / embedded kernel) • (XQuery) Standards • Code maintained by Oracle
  • 18. XQuery XMLType Abstraction DB XQuery Procedural XQuery XQuery Rewrite Pushdown XVM (use “no query rewrite”) Relational Streaming XPath DOM Tree Evaluation Model Access SQL Execution Methods XMLIndex Object-Relational Binary XML Relational Storage Secure Files Source: S317428: Building Really Scalable XML Applications with Oracle XML DB and Oracle Text
  • 19. So about what are we talking ?
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. WikiPedia • Structured & Unstructured bits and pieces • A lot of “unbounded” elements • Not a lot of restrictions • The bit with value is in element “tekst”
  • 28. How do we get this Structured?
  • 29.
  • 30.
  • 31. Strings = small & defined (12c?) Ename  pointer += 100;
  • 35. <verybigggr> <empno>1</empno><ename>Marco</ename> <empno>2</empno> </verybigggr> <small/><verybigggr/><bigger/>
  • 36.
  • 37.
  • 38.
  • 39.
  • 41. “XMLType” Container In Memory CLOB (document) (document) Object Relational Binary XML (data) (data)
  • 42. XMLType In Memory (document) XOB XML Schema
  • 43. XMLType Binary XML Securefile (document/content) Post Parse LOB Index
  • 44. XMLType Object Relational (content) Fully Shredded Indexes
  • 45. Something else to Realize !
  • 46. “What is the fastest way to get this stuff in the database…?”
  • 48. “So what is the fastest way to get XML in the database…?”
  • 50. “So what is the fastest way to get XML in the database… … and useful in my case…?”
  • 51. Garbage IN – Garbage OUT
  • 52. WikiPedia • SQL*Loader • Parallel or Direct • Securefile LOB Column • 2.5 hours And no (performant) way to get the details out… a.k.a “completely useless”
  • 53. WikiPedia • SQL*Loader • Parallel or Direct • Securefile Binary XML • …2.5 hours ???
  • 54. XML Parsing • SAX - Simple API for XML • DOM - Document Object Module
  • 55. fast insert performance CLOB XMLType CLOB (domain) indexes XMLType Binary XML XMLType Object Relational fast select performance
  • 56.
  • 57. XML Partitioning • Object Relational Partitioning – Equi-Partitioning since version Oracle 11.1.0.7.0 • Binary XML Partitioning – Range, List, Hash • Local partitioned XMLIndex – LOCAL keyword in XMLIndex create syntax • Partition Key on virtual column (Binary XML) • Partition Key on column (Object Relational)
  • 58. XMLType Binary XML Securefile (document/content) Post Parse LOB Index
  • 59. Driving access on CONTENT BTre e Index bookstore Function based Index (XPath) book whitepaper title author author chapter title author id paragraph Unstructured Structured XMLIndex XMLIndex content structured content BTree Oracle XML Index Text Index
  • 61. Structured XMLIndex (SXI) • CONTENT TABLE(s) • Based on XMLTABLE syntax Structured XMLIndex • XMLTable construct can be f (x) nested: – VIRTUAL column alias • Can be maintained manually • Secondary indexes possible Content Tables
  • 62. Describe CONTENT TABLE • A “regular” heap table with columns… • Ideal for secondary indexes, if needed.
  • 63. CONTENT TABLE(s) Structured XMLIndex f (x) Content Tables
  • 65. Unstructured XMLIndex (UXI) • PATH TABLE • Use Path Subsetting Unstructured – Full Blown XMLIndex can be BIG XMLIndex f (x) • Token Tables (XDB.X$......) – Query re-write on Tokens – Fuzzy Searches, // – Optimizer Statistics • Can be maintained manually – Recorded in Pending Table Path Table • Secondary indexes possible
  • 69. Binary XML – No Index
  • 70. Binary XML + XMLIndex (SXI)
  • 71. Binary XML + XMLIndex + Sec.Ind.
  • 72. Binary XML + XMLIndex + Sec.Ind.
  • 74. XML Full Tekst Index • Based on Oracle Text Index, XQuery Full Text • XML Namespace Aware • XML Semantic aware full text search – Full-Tekst Selection Expression – contains text – Logical Full Text Operator – ftor, ftand, ftMildNot – Context Aware full text search
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85. Balanced Design • Inserts, Updates & Deletes – XML Future Changes – Index Maintenance In Memory On Disk • Selects – In Memory – Via Indexes • XML Validation – Strict, Lazy – Client Side Possibilities
  • 86. Reward • Optimal performance • Out performing XML • Proper design will give performance increase over XML handling… …proper design is still key…
  • 87.
  • 88. References Oracle XML DB – http://www.oracle.com/pls/db112/homepage XML DB FAQ Thread – http://forums.oracle.com/forums/thread.jspa?thr eadID=410714 Personal Blog – http://www.xmldb.nl – http://technology.amis.nl
  • 89. References Daniela Florescu, Oracle Corporation Advances in XML and XQuery Sam Idicula, Oracle XML DB Development Team Binary XML Storage and Query Processing in Oracle Jinyu Wang, Scott Brewton Making XML Technology Easier to Use Joel Spolsky - Joel on Software Back to Basics
  • 90. References Oracle XML DB Main page material • Oracle XML DB : Best Practices to Get Optimal Performance out of XML Queries (PDF) • Oracle XML DB : Choosing the Best XMLType Storage Option for Your Use Case (PDF) • A Request for Comments for the Oracle Binary XML Format

Notas do Editor

  1. See also OOW 2010, S317428: Building Really Scalable XML Applications with Oracle XML DB and Oracle Text – Nipun Agarwal, Oracle