SlideShare uma empresa Scribd logo
1 de 31
Keyword Proximity Search
in XML Trees


                    Andrada Astefanoaie
                     XML and Database Systems
                                      SS 2010
Outline
                           I.     Introduction
                           II.    Framework
                           III.   Algorithms:Indexed XML Data
Keyword Proximity Search
                           IV. Processing Unindexed XML DATA
in XML Trees
                           V.     Experimental Evaluation
                           VI. Overview
Introduction - Framework - Algorithms:Indexed XML Data – Processing Unindexed XML Data - Experimental Evaluation - Overview



Keyword Search                                                                           Keyword Proximity Search
                                                                                                    in XML Trees


Keyword search

user-friendly information discovery technique

extensively studied for text documents.



Keyword proximity search

well-suited to XML documents
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview




Notation                                                                                 Keyword Proximity Search
                                                                                                    in XML Trees


                     XML DOCUMENT                                        directed tree with labeles

                          -   labled with λ(v), a tag

                          -   4-tuple: id(v)
                                       start and end correspond to the first and the final times the node is
       v                               visited in a depth-first traversal of the XML tree,
                                       depth is the depth of the node from the root of the tree.

                          -   if v is a leaf, it has a string value val(v) that contains a list of keywords


                                        set of keywords k1,. . . , km.
keyword query
                                        returns a compact representation of the set of trees that connect
                                       the nodes that contain the keywords
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview




Notation                                                                                   Keyword Proximity Search
                                                                                                      in XML Trees


                                                                                            r




                                                                                                c1




                                                                            s1              s2                       s3




                                                                                 p2                                   p5   p6
                                                                  p1                             p3             p4




                                                           a1          a2   a3        a4    a5        a6   a7   a8    a9        a10
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview




Notation                                                                                      Keyword Proximity Search
                                                                                                         in XML Trees

Definition

minimum connecting tree (MCT) of nodes v1,. . . ,vm of a tree → the minimum size subtree that
connects v1, . . . ,vm.

root of the tree → the lowest common ancestor (LCA) of the nodes v1, . . . ,vm.
Examples:
                                    r                                                                       r
MCTs for the query                                                 MCTs for the query
“Tom, Harry”                        c1                             “Tom, Dick, Harry”                       c1




                s1                  s2                  s3                              s1                  s2                  s3




      p1             p2                            p4   p5   p6               p1             p2                 p3         p4   p5    p6
                                     p3




 a1        a2   a3        a4   a5        a6   a7   a8   a9   a10         a1        a2   a3                            a7   a8
                                                                                                  a4   a5        a6              a9   a10
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview




Notation                                                                                 Keyword Proximity Search
                                                                                                    in XML Trees


DMCT

v1, . . . , vm Є T.
Distance MCT (DMCT) TD=d(TM) of the MCT TM of nodes v1, . . . , vm → the minimum node-labeled
and edge-labeled tree such that:
                                                                                                           TD contains v1, . . . , vm



                                                                                                     TD contains the LCAs u1, . . . , uk
                                                                                                     of any pair of nodes (vi, vj)
                                                                                                     where vi , vj Є [v1, . . . , vm], i≠ j



                                                                                                      edge labeled with l between
                                                                                                      any two distinct nodes n, n’ Є
                                                                                                      {v1,...,vm, u1, . . . ,uk} if there is a
                                                                                                      path of length l from n’ to n in
                                                                                                      TM and the path does not
                                                                                                      contain any node n’’ Є { u1, . . . ,
                                                                                                      um} other than n and n’.
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview




Notation                                                                                 Keyword Proximity Search
                                                                                                    in XML Trees


GDMCT

A Grouped DMCT of a tree T is a labeled tree where edges are labeled with numbers and nodes
are labeled with lists of node ids from T.

DMCT D Є GDMCT G if D and G are isomorphic. Assuming that f is the mapping of the nodes of D
to the nodes of G, which induces a corresponding mapping, also called f, of the edges of D to
the edges of G, the following must hold:
                                                                                                      nD is a node of D, nG is a node of
                                                                                                       G and f(nD)=nG, then the label
                                                                                                        of nG contains the id of nD.




                                                                                               eD is an edge of D, eG is an edge
                                                                                               of G and f(eD) = eG, then the
                                                                                               label of eD and the label of eG
                                                                                               are the same number.
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview




Problems                                                                                 Keyword Proximity Search
                                                                                                    in XML Trees

Problem 1 : All GDMCTs Problem

Query                                  K                                        Result

“Tom, Harry”                           5




                                       3
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview




Problems                                                                                 Keyword Proximity Search
                                                                                                    in XML Trees


Problem 2 : Lowest GDMCTs Problem


Query                                  K                                        Result

“Tom, Harry”                           5




                                       3
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


All GDMCTs:                                                                             Keyword Proximity Search
                                                                                                   in XML Trees
Nested Loop Algorithm
The nested loops algorithm (NL) for the case of indexed XML                             Examples of some entries in the
data operates over separate lists of nodes, L(k), one for each                          master index for our tree:
query keyword, k, to identify the GDMCTs whose sizes are no
more than the user-provided threshold, K.




Master index             inverted index             a hash table

                                                        list L(k)

                                each node n has path-id
      (the list of node ids along the path from the root of T to n)
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


All GDMCTs:                                                                             Keyword Proximity Search
                                                                                                   in XML Trees
Nested Loop Algorithm
                                                                              checks all combinations of
                                                                             nodes from the keyword lists.




                                                                            for each combination computes
                                                                            an MCT (minimum connecting
                                                                            tree)




                                                                           merges the resulting MCT into
                                                                           the list of result GDMCTs, if its
                                                                           size is within the user-specified
                                                                           threshold.
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


All GDMCTs:                                                                             Keyword Proximity Search
                                                                                                   in XML Trees
Nested Loop Algorithm
For example:
Query: “Tom, Harry” and K=3,




NL examine the 12 node-pairs                  12 MCTs
           determine 2 of them meet the
threshold(K)             return 2 GDMCTs:
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


All GDMCTs:                                                                             Keyword Proximity Search
                                                                                                   in XML Trees
Nested Loop Algorithm
Inefficienty:

 NL checks all the combinations of nodes
    from the keyword lists



 The grouping of the results into GDMCTs is
    not lightly integrated with the algorithm
    and a lookup to the array R is required for
    each relevant MCT found.
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


All GDMCTs:                                                                             Keyword Proximity Search
                                                                                                   in XML Trees
Stack-Based Algorithm
Index Structure and Algorithm.

The stack-based algorithm for computing GDMCTs on indexed XML data operates over lists of
nodes, two for each query keyword.


Indexing by keyword                   master index                contains 2 lists



                  o L(k) of the nodes of T that contain k in T and
                  o Ld(k) of the ancestors of nodes in L(k).
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


All GDMCTs:                                                                             Keyword Proximity Search
                                                                                                   in XML Trees
Stack-Based Algorithm
Index Structure and Algorithm.

For example the entries for Tom, Dick and Harry are:
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


All GDMCTs:                                                                             Keyword Proximity Search
                                                                                                   in XML Trees
Stack-Based Algorithm
Index Structure and Algorithm.

This is the high-level description of the SA.




                                                                                        It describes how the selected
                                                                                        list of nodes is traversed in a
                                                                                        depth-first manner and the
                                                                                        nodes are pushed and popped
                                                                                        from the stack.
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


All GDMCTs:                                                                             Keyword Proximity Search
                                                                                                   in XML Trees
Stack-Based Algorithm
Index Structure and Algorithm.




                                                                                                   novel part of the SA algorithm




                                                                                                  processing and bookkeeping
                                                                                                  performed at each stack
                                                                                                  operation
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


All GDMCTs:                                                                             Keyword Proximity Search
                                                                                                   in XML Trees
Stack-Based Algorithm
Index Structure and Algorithm.



                                                                                          Functions that are called from
                                                                                          POP(S)
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


All GDMCTs:                                                                             Keyword Proximity Search
                                                                                                   in XML Trees
Stack-Based Algorithm
Illustrative Example

Query: “Tom, Harry”
K=3

Master index lists:




The intersection of the lists:
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


All GDMCTs:                                                                               Keyword Proximity Search
                                                                                                     in XML Trees
Stack-Based Algorithm
Illustrative Example
                                        Master index lists:                          Intersection of the La

Query: “Tom, Harry”
K=3


Some of the initial stack states of the execution of the Stack Algorithm:
1.                                         2.                                              3.
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


All GDMCTs:                                                                               Keyword Proximity Search
                                                                                                     in XML Trees
Stack-Based Algorithm
Illustrative Example
                                        Master index lists:                          Intersection of the La

Query: “Tom, Harry”
K=3


Some of the initial stack states of the execution of the Stack Algorithm:
4.                                         5.                                              6.
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


All GDMCTs:                                                                               Keyword Proximity Search
                                                                                                     in XML Trees
Stack-Based Algorithm
Illustrative Example
                                        Master index lists:                          Intersection of the La

Query: “Tom, Harry”
K=3


Some of the initial stack states of the execution of the Stack Algorithm:
7.                                         8.                                              9. Entries from the lists
                                                                                           continue being examined,
                                                                                           new GDMCTs are created and
                                                                                           pruned until all the answers
                                                                                           are output.



                                                                                                         ...
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


Lowest GDMCTs:                                                                          Keyword Proximity Search
                                                                                                   in XML Trees
Stack- Based Algorithm




The key observation is that once we output the GDMCTs of a node u, none of the ancestors of u
in the stack can be LCAs of returned GDMCTs; hence, we can remove all of them from the stack!

Specifically, we can add the following lines after line 5:
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


LCAs:                                                                                   Keyword Proximity Search
                                                                                                   in XML Trees
Stack- Based Algorithms
The Stack Algorithm can also be easily modified to solve the All LCAs Problem and the Lowest
LCAs Problem, where the user is not interested in the GDMCTs, but only in the LCA nodes.



    o First, Merge(.) could be simplified, no merging of GDMCTs would need to be done, and

    line 33 could be replaced by:




    o Second, we can output an LCA early when the first GDMCT (with all keywords) is
    computed for that node (in Procedure CreateNewGDMCTs(.)), instead of waiting until the
    node is popped from the stack.
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


Complexity                                                                              Keyword Proximity Search
                                                                                                   in XML Trees
Analysis
Total number of GDMCTs

Worst case: the number of DMCTs and of GDMCTs = exponential on the number of keywords.

Under reasonable assumptions, the worst-case number of GDMCTs is smaller than that of
DMCTs

Complexity of Finding Isomorphic GDMCTs

Given this canonical representation prezented in this chapter, one can linearize the GDMCTs in
an XML-like nested representation with start and end tags, obtained from the node
annotations.


Theorem 1. The time complexity of SA is

                                     O( L  K  (i 1 L(ki ) ) 2 )
                                                              m
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


Processing                                                                              Keyword Proximity Search
                                                                                                   in XML Trees
Unindexed XML Data
Both the NL Algorithm and the SA have adaptations to work without index lists by doing a single
pass over the data tree.


The streaming version of the Stack Algorithm                                 following changes to the Stack
Algorithm SA(k1,..km, K):
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


Experimental                                                                             Keyword Proximity Search
                                                                                                    in XML Trees
Evaluation


Parameters affecting the performance of the presented algorithms:


            1) the value of K denoting the threshold,
            2) the number m of keywords,
            3) the size of the data set.


Tests show that usually the algorithms based on the Stack Algorithm have better results than the
Nested Loops Algorithms both in the Indexed and Unindexed data.
Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview


Overview                                                                                 Keyword Proximity Search
                                                                                                    in XML Trees

There were presented two main problems:
    1) identifying and presenting in a compact manner all MCTs which explain how the keywords are
       connected
    2) identifying only MCTs whose root is not an ancestor of the root of another MCT.



There are presented solutions:

    1) when the XML data has been preprocessed and relevant indices have been constructed
                   - Nested Loop Algorithm
                   - Stack Algorithm
    2) when the XML data has not been preprocessed, i.e., the XML data can only beprocessed
    sequentially.



Benefits of the algorithms are shown by the Experimental Evaluation
Resource




Name
              Keyword Proximity Search in XML Trees

                                      Vangelis Hristidis, Nick Koudas,
                      Yannis Papakonstantinou and Diverish Srivastava
Authors

                IEEE Transactions on Knoledge and Data Engineering
Publication
                                           Vol 18, No 4, APRIL 2006
Keyword Proximity Search
in XML Trees




Thank you!

Mais conteúdo relacionado

Destaque

Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)weiw_oz
 
Interactive Query and Search for your Big Data
Interactive Query and Search for your Big DataInteractive Query and Search for your Big Data
Interactive Query and Search for your Big DataDataWorks Summit
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and RetrievalOptum
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithmRupali Bhatnagar
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsDKALab
 
E-Learning Baseline, UCL
E-Learning Baseline, UCLE-Learning Baseline, UCL
E-Learning Baseline, UCLJisc
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysisM. Atif Qureshi
 
Fundraising Tips - SCMM
Fundraising Tips - SCMMFundraising Tips - SCMM
Fundraising Tips - SCMMSwati_UWM
 
5ภาษาอังกฤษ
5ภาษาอังกฤษ5ภาษาอังกฤษ
5ภาษาอังกฤษNontt' Panich
 
23 Per Olav Vandvik - Kunnskapsbasert praksis i praksis: Hva slags verktøy tr...
23 Per Olav Vandvik - Kunnskapsbasert praksis i praksis: Hva slags verktøy tr...23 Per Olav Vandvik - Kunnskapsbasert praksis i praksis: Hva slags verktøy tr...
23 Per Olav Vandvik - Kunnskapsbasert praksis i praksis: Hva slags verktøy tr...Helsebiblioteket.no
 
Issue 116 obesity in adults PKU
Issue 116 obesity in adults PKUIssue 116 obesity in adults PKU
Issue 116 obesity in adults PKULouise Robertson
 
Matematika Matriks (Created by AkangCyber)
Matematika Matriks (Created by AkangCyber)Matematika Matriks (Created by AkangCyber)
Matematika Matriks (Created by AkangCyber)Amyzyx Rayevent
 
Struktur dan fungsi organ tumbuhan ii
Struktur dan fungsi organ tumbuhan iiStruktur dan fungsi organ tumbuhan ii
Struktur dan fungsi organ tumbuhan iiNispi Hariyani
 
Ruby on Railsではじめるrspecテスト
Ruby on RailsではじめるrspecテストRuby on Railsではじめるrspecテスト
Ruby on RailsではじめるrspecテストKanako Kobayashi
 
HIV TO SELF-DESTRUCT. M.I.T. RESEARCH BY TIFFANY AMARIUTA
HIV TO SELF-DESTRUCT.      M.I.T. RESEARCH BY TIFFANY AMARIUTAHIV TO SELF-DESTRUCT.      M.I.T. RESEARCH BY TIFFANY AMARIUTA
HIV TO SELF-DESTRUCT. M.I.T. RESEARCH BY TIFFANY AMARIUTAElenusz
 
Francuska deklaracija o pravima čoveka i gradjanina iz 1789
Francuska deklaracija o pravima čoveka i gradjanina iz 1789Francuska deklaracija o pravima čoveka i gradjanina iz 1789
Francuska deklaracija o pravima čoveka i gradjanina iz 1789gordana comic
 

Destaque (19)

Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
 
Interactive Query and Search for your Big Data
Interactive Query and Search for your Big DataInteractive Query and Search for your Big Data
Interactive Query and Search for your Big Data
 
Presentation
PresentationPresentation
Presentation
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event Models
 
E-Learning Baseline, UCL
E-Learning Baseline, UCLE-Learning Baseline, UCL
E-Learning Baseline, UCL
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Fundraising Tips - SCMM
Fundraising Tips - SCMMFundraising Tips - SCMM
Fundraising Tips - SCMM
 
5ภาษาอังกฤษ
5ภาษาอังกฤษ5ภาษาอังกฤษ
5ภาษาอังกฤษ
 
23 Per Olav Vandvik - Kunnskapsbasert praksis i praksis: Hva slags verktøy tr...
23 Per Olav Vandvik - Kunnskapsbasert praksis i praksis: Hva slags verktøy tr...23 Per Olav Vandvik - Kunnskapsbasert praksis i praksis: Hva slags verktøy tr...
23 Per Olav Vandvik - Kunnskapsbasert praksis i praksis: Hva slags verktøy tr...
 
tut0000021-hevery
tut0000021-heverytut0000021-hevery
tut0000021-hevery
 
Issue 116 obesity in adults PKU
Issue 116 obesity in adults PKUIssue 116 obesity in adults PKU
Issue 116 obesity in adults PKU
 
Matematika Matriks (Created by AkangCyber)
Matematika Matriks (Created by AkangCyber)Matematika Matriks (Created by AkangCyber)
Matematika Matriks (Created by AkangCyber)
 
Proyecto de regulacion aduanera
Proyecto de regulacion aduaneraProyecto de regulacion aduanera
Proyecto de regulacion aduanera
 
Struktur dan fungsi organ tumbuhan ii
Struktur dan fungsi organ tumbuhan iiStruktur dan fungsi organ tumbuhan ii
Struktur dan fungsi organ tumbuhan ii
 
Ruby on Railsではじめるrspecテスト
Ruby on RailsではじめるrspecテストRuby on Railsではじめるrspecテスト
Ruby on Railsではじめるrspecテスト
 
HIV TO SELF-DESTRUCT. M.I.T. RESEARCH BY TIFFANY AMARIUTA
HIV TO SELF-DESTRUCT.      M.I.T. RESEARCH BY TIFFANY AMARIUTAHIV TO SELF-DESTRUCT.      M.I.T. RESEARCH BY TIFFANY AMARIUTA
HIV TO SELF-DESTRUCT. M.I.T. RESEARCH BY TIFFANY AMARIUTA
 
Francuska deklaracija o pravima čoveka i gradjanina iz 1789
Francuska deklaracija o pravima čoveka i gradjanina iz 1789Francuska deklaracija o pravima čoveka i gradjanina iz 1789
Francuska deklaracija o pravima čoveka i gradjanina iz 1789
 

Semelhante a Keyword proximity search in xml trees andrada astefanoaie - presentation

Xml processing-by-asfak
Xml processing-by-asfakXml processing-by-asfak
Xml processing-by-asfakAsfak Mahamud
 
BGOUG 2012 - Design concepts for xml applications that will perform
BGOUG 2012 - Design concepts for xml applications that will performBGOUG 2012 - Design concepts for xml applications that will perform
BGOUG 2012 - Design concepts for xml applications that will performMarco Gralike
 
Collaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringCollaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringWaqas Nawaz
 
02_Xpath.pdf
02_Xpath.pdf02_Xpath.pdf
02_Xpath.pdfPrerak10
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGGeorge Simov
 
Compiler design selective dissemination of information syntax direct translat...
Compiler design selective dissemination of information syntax direct translat...Compiler design selective dissemination of information syntax direct translat...
Compiler design selective dissemination of information syntax direct translat...ganeshjaggineni1927
 
Content-sensitive User Interfaces for Annotated Web Pages
Content-sensitive User Interfaces for Annotated Web PagesContent-sensitive User Interfaces for Annotated Web Pages
Content-sensitive User Interfaces for Annotated Web Pagesflxn13
 
BGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index StrategiesBGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index StrategiesMarco Gralike
 
Data assimilation with OpenDA
Data assimilation with OpenDAData assimilation with OpenDA
Data assimilation with OpenDAnilsvanvelzen
 
Financial Networks IV. Analyzing and Visualizing Exposures
Financial Networks IV. Analyzing and Visualizing ExposuresFinancial Networks IV. Analyzing and Visualizing Exposures
Financial Networks IV. Analyzing and Visualizing ExposuresKimmo Soramaki
 
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured DataHotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured DataMarco Gralike
 
Extraction of topic evolutions from references in scientific articles and its...
Extraction of topic evolutions from references in scientific articles and its...Extraction of topic evolutions from references in scientific articles and its...
Extraction of topic evolutions from references in scientific articles and its...Tomonari Masada
 
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation LanguagesSyntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation LanguagesTara Athan
 
Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011Ed Dodds
 

Semelhante a Keyword proximity search in xml trees andrada astefanoaie - presentation (20)

Xml processing-by-asfak
Xml processing-by-asfakXml processing-by-asfak
Xml processing-by-asfak
 
BGOUG 2012 - Design concepts for xml applications that will perform
BGOUG 2012 - Design concepts for xml applications that will performBGOUG 2012 - Design concepts for xml applications that will perform
BGOUG 2012 - Design concepts for xml applications that will perform
 
Collaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringCollaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph Clustering
 
02_Xpath.pdf
02_Xpath.pdf02_Xpath.pdf
02_Xpath.pdf
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERING
 
Linked Data Integration
Linked Data IntegrationLinked Data Integration
Linked Data Integration
 
Compiler design selective dissemination of information syntax direct translat...
Compiler design selective dissemination of information syntax direct translat...Compiler design selective dissemination of information syntax direct translat...
Compiler design selective dissemination of information syntax direct translat...
 
Content-sensitive User Interfaces for Annotated Web Pages
Content-sensitive User Interfaces for Annotated Web PagesContent-sensitive User Interfaces for Annotated Web Pages
Content-sensitive User Interfaces for Annotated Web Pages
 
Arrays in database systems, the next frontier?
Arrays in database systems, the next frontier?Arrays in database systems, the next frontier?
Arrays in database systems, the next frontier?
 
BGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index StrategiesBGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index Strategies
 
Data assimilation with OpenDA
Data assimilation with OpenDAData assimilation with OpenDA
Data assimilation with OpenDA
 
GraphREL: A Relational Graph Query Processor
GraphREL: A Relational Graph Query ProcessorGraphREL: A Relational Graph Query Processor
GraphREL: A Relational Graph Query Processor
 
Financial Networks IV. Analyzing and Visualizing Exposures
Financial Networks IV. Analyzing and Visualizing ExposuresFinancial Networks IV. Analyzing and Visualizing Exposures
Financial Networks IV. Analyzing and Visualizing Exposures
 
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured DataHotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
 
Scala google
Scala google Scala google
Scala google
 
Extraction of topic evolutions from references in scientific articles and its...
Extraction of topic evolutions from references in scientific articles and its...Extraction of topic evolutions from references in scientific articles and its...
Extraction of topic evolutions from references in scientific articles and its...
 
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation LanguagesSyntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
 
Xpath.ppt
Xpath.pptXpath.ppt
Xpath.ppt
 
XML data binding
XML data bindingXML data binding
XML data binding
 
Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011
 

Último

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxNikitaBankoti2
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 

Último (20)

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

Keyword proximity search in xml trees andrada astefanoaie - presentation

  • 1. Keyword Proximity Search in XML Trees Andrada Astefanoaie XML and Database Systems SS 2010
  • 2. Outline I. Introduction II. Framework III. Algorithms:Indexed XML Data Keyword Proximity Search IV. Processing Unindexed XML DATA in XML Trees V. Experimental Evaluation VI. Overview
  • 3. Introduction - Framework - Algorithms:Indexed XML Data – Processing Unindexed XML Data - Experimental Evaluation - Overview Keyword Search Keyword Proximity Search in XML Trees Keyword search user-friendly information discovery technique extensively studied for text documents. Keyword proximity search well-suited to XML documents
  • 4. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview Notation Keyword Proximity Search in XML Trees XML DOCUMENT directed tree with labeles - labled with λ(v), a tag - 4-tuple: id(v) start and end correspond to the first and the final times the node is v visited in a depth-first traversal of the XML tree, depth is the depth of the node from the root of the tree. - if v is a leaf, it has a string value val(v) that contains a list of keywords set of keywords k1,. . . , km. keyword query returns a compact representation of the set of trees that connect the nodes that contain the keywords
  • 5. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview Notation Keyword Proximity Search in XML Trees r c1 s1 s2 s3 p2 p5 p6 p1 p3 p4 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10
  • 6. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview Notation Keyword Proximity Search in XML Trees Definition minimum connecting tree (MCT) of nodes v1,. . . ,vm of a tree → the minimum size subtree that connects v1, . . . ,vm. root of the tree → the lowest common ancestor (LCA) of the nodes v1, . . . ,vm. Examples: r r MCTs for the query MCTs for the query “Tom, Harry” c1 “Tom, Dick, Harry” c1 s1 s2 s3 s1 s2 s3 p1 p2 p4 p5 p6 p1 p2 p3 p4 p5 p6 p3 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a1 a2 a3 a7 a8 a4 a5 a6 a9 a10
  • 7. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview Notation Keyword Proximity Search in XML Trees DMCT v1, . . . , vm Є T. Distance MCT (DMCT) TD=d(TM) of the MCT TM of nodes v1, . . . , vm → the minimum node-labeled and edge-labeled tree such that: TD contains v1, . . . , vm TD contains the LCAs u1, . . . , uk of any pair of nodes (vi, vj) where vi , vj Є [v1, . . . , vm], i≠ j edge labeled with l between any two distinct nodes n, n’ Є {v1,...,vm, u1, . . . ,uk} if there is a path of length l from n’ to n in TM and the path does not contain any node n’’ Є { u1, . . . , um} other than n and n’.
  • 8. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview Notation Keyword Proximity Search in XML Trees GDMCT A Grouped DMCT of a tree T is a labeled tree where edges are labeled with numbers and nodes are labeled with lists of node ids from T. DMCT D Є GDMCT G if D and G are isomorphic. Assuming that f is the mapping of the nodes of D to the nodes of G, which induces a corresponding mapping, also called f, of the edges of D to the edges of G, the following must hold: nD is a node of D, nG is a node of G and f(nD)=nG, then the label of nG contains the id of nD. eD is an edge of D, eG is an edge of G and f(eD) = eG, then the label of eD and the label of eG are the same number.
  • 9. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview Problems Keyword Proximity Search in XML Trees Problem 1 : All GDMCTs Problem Query K Result “Tom, Harry” 5 3
  • 10. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview Problems Keyword Proximity Search in XML Trees Problem 2 : Lowest GDMCTs Problem Query K Result “Tom, Harry” 5 3
  • 11. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview All GDMCTs: Keyword Proximity Search in XML Trees Nested Loop Algorithm The nested loops algorithm (NL) for the case of indexed XML Examples of some entries in the data operates over separate lists of nodes, L(k), one for each master index for our tree: query keyword, k, to identify the GDMCTs whose sizes are no more than the user-provided threshold, K. Master index inverted index a hash table list L(k) each node n has path-id (the list of node ids along the path from the root of T to n)
  • 12. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview All GDMCTs: Keyword Proximity Search in XML Trees Nested Loop Algorithm checks all combinations of nodes from the keyword lists. for each combination computes an MCT (minimum connecting tree) merges the resulting MCT into the list of result GDMCTs, if its size is within the user-specified threshold.
  • 13. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview All GDMCTs: Keyword Proximity Search in XML Trees Nested Loop Algorithm For example: Query: “Tom, Harry” and K=3, NL examine the 12 node-pairs 12 MCTs determine 2 of them meet the threshold(K) return 2 GDMCTs:
  • 14. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview All GDMCTs: Keyword Proximity Search in XML Trees Nested Loop Algorithm Inefficienty:  NL checks all the combinations of nodes from the keyword lists  The grouping of the results into GDMCTs is not lightly integrated with the algorithm and a lookup to the array R is required for each relevant MCT found.
  • 15. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview All GDMCTs: Keyword Proximity Search in XML Trees Stack-Based Algorithm Index Structure and Algorithm. The stack-based algorithm for computing GDMCTs on indexed XML data operates over lists of nodes, two for each query keyword. Indexing by keyword master index contains 2 lists o L(k) of the nodes of T that contain k in T and o Ld(k) of the ancestors of nodes in L(k).
  • 16. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview All GDMCTs: Keyword Proximity Search in XML Trees Stack-Based Algorithm Index Structure and Algorithm. For example the entries for Tom, Dick and Harry are:
  • 17. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview All GDMCTs: Keyword Proximity Search in XML Trees Stack-Based Algorithm Index Structure and Algorithm. This is the high-level description of the SA. It describes how the selected list of nodes is traversed in a depth-first manner and the nodes are pushed and popped from the stack.
  • 18. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview All GDMCTs: Keyword Proximity Search in XML Trees Stack-Based Algorithm Index Structure and Algorithm. novel part of the SA algorithm processing and bookkeeping performed at each stack operation
  • 19. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview All GDMCTs: Keyword Proximity Search in XML Trees Stack-Based Algorithm Index Structure and Algorithm. Functions that are called from POP(S)
  • 20. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview All GDMCTs: Keyword Proximity Search in XML Trees Stack-Based Algorithm Illustrative Example Query: “Tom, Harry” K=3 Master index lists: The intersection of the lists:
  • 21. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview All GDMCTs: Keyword Proximity Search in XML Trees Stack-Based Algorithm Illustrative Example Master index lists: Intersection of the La Query: “Tom, Harry” K=3 Some of the initial stack states of the execution of the Stack Algorithm: 1. 2. 3.
  • 22. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview All GDMCTs: Keyword Proximity Search in XML Trees Stack-Based Algorithm Illustrative Example Master index lists: Intersection of the La Query: “Tom, Harry” K=3 Some of the initial stack states of the execution of the Stack Algorithm: 4. 5. 6.
  • 23. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview All GDMCTs: Keyword Proximity Search in XML Trees Stack-Based Algorithm Illustrative Example Master index lists: Intersection of the La Query: “Tom, Harry” K=3 Some of the initial stack states of the execution of the Stack Algorithm: 7. 8. 9. Entries from the lists continue being examined, new GDMCTs are created and pruned until all the answers are output. ...
  • 24. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview Lowest GDMCTs: Keyword Proximity Search in XML Trees Stack- Based Algorithm The key observation is that once we output the GDMCTs of a node u, none of the ancestors of u in the stack can be LCAs of returned GDMCTs; hence, we can remove all of them from the stack! Specifically, we can add the following lines after line 5:
  • 25. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview LCAs: Keyword Proximity Search in XML Trees Stack- Based Algorithms The Stack Algorithm can also be easily modified to solve the All LCAs Problem and the Lowest LCAs Problem, where the user is not interested in the GDMCTs, but only in the LCA nodes. o First, Merge(.) could be simplified, no merging of GDMCTs would need to be done, and line 33 could be replaced by: o Second, we can output an LCA early when the first GDMCT (with all keywords) is computed for that node (in Procedure CreateNewGDMCTs(.)), instead of waiting until the node is popped from the stack.
  • 26. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview Complexity Keyword Proximity Search in XML Trees Analysis Total number of GDMCTs Worst case: the number of DMCTs and of GDMCTs = exponential on the number of keywords. Under reasonable assumptions, the worst-case number of GDMCTs is smaller than that of DMCTs Complexity of Finding Isomorphic GDMCTs Given this canonical representation prezented in this chapter, one can linearize the GDMCTs in an XML-like nested representation with start and end tags, obtained from the node annotations. Theorem 1. The time complexity of SA is O( L  K  (i 1 L(ki ) ) 2 ) m
  • 27. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview Processing Keyword Proximity Search in XML Trees Unindexed XML Data Both the NL Algorithm and the SA have adaptations to work without index lists by doing a single pass over the data tree. The streaming version of the Stack Algorithm following changes to the Stack Algorithm SA(k1,..km, K):
  • 28. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview Experimental Keyword Proximity Search in XML Trees Evaluation Parameters affecting the performance of the presented algorithms: 1) the value of K denoting the threshold, 2) the number m of keywords, 3) the size of the data set. Tests show that usually the algorithms based on the Stack Algorithm have better results than the Nested Loops Algorithms both in the Indexed and Unindexed data.
  • 29. Introduction - Framework - Algorithms:Indexed XML Data - Processing Unindexed XML Data - Experimental Evaluation - Overview Overview Keyword Proximity Search in XML Trees There were presented two main problems: 1) identifying and presenting in a compact manner all MCTs which explain how the keywords are connected 2) identifying only MCTs whose root is not an ancestor of the root of another MCT. There are presented solutions: 1) when the XML data has been preprocessed and relevant indices have been constructed - Nested Loop Algorithm - Stack Algorithm 2) when the XML data has not been preprocessed, i.e., the XML data can only beprocessed sequentially. Benefits of the algorithms are shown by the Experimental Evaluation
  • 30. Resource Name Keyword Proximity Search in XML Trees Vangelis Hristidis, Nick Koudas, Yannis Papakonstantinou and Diverish Srivastava Authors IEEE Transactions on Knoledge and Data Engineering Publication Vol 18, No 4, APRIL 2006
  • 31. Keyword Proximity Search in XML Trees Thank you!