SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Flexible querying of graph data

          Graph processing room
          FOSDEM, 2 Feb 2013


                Petra Selmer
           petra.selmer.uk@gmail.com
       http://www.dcs.bbk.ac.uk/~lselm01/
Introduction

       I shall be presenting my PhD topic which involves
       a declarative query language allowing for the
       flexible querying of graph-structured data with
       complex paths.




2
Agenda

     Who (am I)?
     Why (the motivation)?
     Some background info
     What (is the query language and what
      can it do)?
     Illustrative examples
     How (is it done)?

3
Who?

     Petra Selmer
     Part-time PhD student:
       Birkbeck College, University of London
       Prof. Alexandra Poulovassilis
       Dr. Peter T. Wood
     Software Architect:
       University College London’s Institute of Neurology
        (Wellcome Trust Centre for Neuroimaging)




4
Why?

     Amount of graph-structured data is
      growing fast
     The structure of this data is
      becoming more complex, especially
      when multiple, heterogeneous data
      sources are integrated together
     The structure of the data is also
      always subject to change...

5
Why?
     Users of such systems may not be familiar with the underlying data
      structure: available paths etc
     The user may not be able to obtain meaningful answers (or indeed,
      any answers) from the data IF the querying system is limited to exact
      matching of users’ queries
     Also, the user may wish to explore the data by starting from a set of
      initial answers and proceeding from there
     The user may additionally wish to derive some intelligence from the
      connections....

        The data

                               The query         The user




6
Background: Ontologies

 Currently part of the Semantic Web stack (Tim Berners-
  Lee, RDF, triple stores)
 Models a domain of interest: inferences, reasoning...
 It can be thought of as a “schema” for graph data
 The following inference rules are included (among
  others):
     Subclass: ‘History’, ‘Languages’ are subclasses of
      ‘Humanities’
     Subproperty, Domain, Range...




7
What?
 Data model: G = (V, E)
   Very general model
   V : vertices (or nodes); each labelled with some
    constant
   E : directed, labelled edges; labels drawn from an
    alphabet {Ʃ U ‘type’}
 The query language is called Flex-It (it is
  declarative)
 The basis is that of conjunctive regular path
  queries
 There are two operators which may be applied to the
    original query

8
What?
 Conjunctive regular path queries:
   This is where the graph's paths to be traversed are expressed with a
    regular expression
 A single regular path query conjunct: (X, R, Y)
   X, Y: either constants or variables
   R: the regular expression
 “Conjunctive”: joining multiple conjuncts; e.g. (X, R1, Y), (Y,
    R2, Z), (Z, R3, A)
     The Y’s are matched, the Z’s are matched etc


                                             1) (N1, n+, ?Y):
         n           n            p               • Y = N2, N3
    N1          N2          N3         N4
                                             2) (N1, n*p, ?Y):
                                                  • Y = N4
9
What?
 Approximation allows for the approximate matching
  of labels in the path
 An edit operation is applied to each edge label in
  the path denoted by the regular expression:
      Edit operations: insertions, deletions, inversions,
       substitutions and transpositions of labels
      Each operation has a ‘cost’: usually 1
 Example:
      Query conjunct: (X, a*.b, Y)
      R = a*.b [answers returned at cost 0]
      R’ = p.a*.b (insertion of ‘p’) [answers returned at cost 1]
      R’’ = p.a*.b- (inversion of ‘b’) [answers returned at cost 2]


10
What?
      Relaxation is applied by using inference
      rules from an ontology (if one exists).
       Achieved by applying logical relaxation of the query
        conditions using the data’s ontology definition
       Relaxation operations: subclass, subproperty, domain
        and range
       Each operation has a ‘cost’ – usually 1
      Example:
       We have an ontology:
         Humanities (superclass)
         Languages and History (subclasses of Humanities)
       Assume our query states Languages may be relaxed
         Languages is relaxed to Humanities:
         Instances of Languages will be returned at cost 0
         Instances of History will be returned at cost 1

11
What?

      Answers are ranked according to how
       closely they match the original query;
       higher-cost answers have a lower ranking
      All answers at a certain distance d are
       ranked the same and returned before
       answers at a higher distance
      We allow for incremental execution: exact
       answers returned first; then answers at
       distance 1; ...
12
Example – ‘Lifelong learner metadata’


     sc



 History




13
sc

 History




14
 Query: “What work positions can I reach, having a degree in English”?
        Y = the episode; Z = the job
     (?Y, ?Z) 
        (?X, type, University),
        (?X, qualif.type, EnglishStudies),
        (?X, prereq+, ?Y),
        (?Y, type, Work),
        (?Y, job.type, ?Z)
15
 Query: “What work positions can I reach, having a degree in English”?
        Y = the episode; Z = the job
     (?Y, ?Z) 
        (?X, type, University),
        (?X, qualif.type, EnglishStudies),
        (?X, prereq+, ?Y),
        (?Y, type, Work),
        (?Y, job.type, ?Z)
      No results from User 2 will be returned...even though it is relevant!
16
 Allowing query approximation can yield some answers:
      Replacing the edge label prereq by next, at an edit cost of 1, we get this variant of the
       query:
        (?Y, ?Z) 
           (?X, type, University),
           (?X, qualif.type, EnglishStudies),
           APPROX(?X, prereq+, ?Y),
           (?Y, type, Work),
           (?Y, job.type, ?Z)
  prereq+ can be approximated by next.prereq* at edit distance 1:
      Result: Y = ep22, Z = AirTravelAssistant
17
 Allowing query approximation can yield some answers:
    Replacing the edge label prereq by next, at an edit cost of 1, we get this
       variant of the query:
       (?Y, ?Z) 
          (?X, type, University),
          (?X, qualif.type, EnglishStudies),
          APPROX(?X, prereq+, ?Y),
          (?Y, type, Work),
          (?Y, job.type, ?Z)
  next.prereq* can be approximated by next.next.prereq*, now at edit distance 2:
    Results:
       Y = ep23, Z = Journalist
       Y = ep24, Z = AssistantEditor
18
sc

     History




19
   Query: “What jobs are open to me if I study English, or something similar, at University”?
     (?Y, ?Z) 
         (?X, type, University), (?X, qualif, ?D),
         RELAX (?D, type, EnglishStudies),
         APPROX (?X, prereq+, ?Y),
         (?Y, type, Work), (?Y, job.type, ?Z)
        In addition to the answers (from User 2) obtained by the previous query, we now also have
         answers from the timeline of User 3
        prereq+ can be approximated by next.prereq* (distance 1) and EnglishStudies can be relaxed
         – via Languages - to Humanities (distance 2), encompassing History
          Result: Y = ep32, Z = PersonalAssistant (distance of 3 from original query)
20
   Query: “What jobs are open to me if I study English, or something similar, at
         University”?
     (?Y, ?Z) 
         (?X, type, University), (?X, qualif, ?D),
         RELAX (?D, type, EnglishStudies),
         APPROX (?X, prereq+, ?Y),
         (?Y, type, Work), (?Y, job.type, ?Z)
        next.prereq* can be approximated by next.next.prereq* (distance 2), with
         EnglishStudies again relaxed to Humanities (distance 2)
            Results: (both at distance 4 from the original query)
              Y = ep33, Z = Author
              Y = e34, Z = AssociateEditor
21
How?
      Theory
       Construction of a weighted non-deterministic finite
        automaton (NFA) to represent the regular expression
         We apply new states and transitions to the NFA to represent the
          approximation and relaxation operations
       Formation of a product automaton: NFA with data
        graph G
       We perform a lowest cost path traversal of the product
        automaton; construct query tree, do joins etc
       Polynomial time complexity
       Correctness of algorithms proven



22
How?

      Implementation of prototype
        Graph database: DEX (http://www.sparsity-
         technologies.com/dex)
        Programming language: C#
      Further work
        New flexible operation combining APPROX and
         RELAX  FLEX
        Optimisation!




23
Any questions?

     Thank you for your attention!

                      petra.selmer.uk@gmail.com
24

Mais conteúdo relacionado

Mais procurados

Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLP
butest
 

Mais procurados (20)

Erlang session1
Erlang session1Erlang session1
Erlang session1
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
Assignment 7
Assignment 7Assignment 7
Assignment 7
 
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
 
Oops concept
Oops conceptOops concept
Oops concept
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
object oriented programming OOP
object oriented programming OOPobject oriented programming OOP
object oriented programming OOP
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
17. Java data structures trees representation and traversal
17. Java data structures trees representation and traversal17. Java data structures trees representation and traversal
17. Java data structures trees representation and traversal
 
14 Defining Classes
14 Defining Classes14 Defining Classes
14 Defining Classes
 
Declarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemTDeclarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemT
 
Ifi7184.DT lesson 2
Ifi7184.DT lesson 2Ifi7184.DT lesson 2
Ifi7184.DT lesson 2
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Java Foundations: Basic Syntax, Conditions, Loops
Java Foundations: Basic Syntax, Conditions, LoopsJava Foundations: Basic Syntax, Conditions, Loops
Java Foundations: Basic Syntax, Conditions, Loops
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNN
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLP
 

Semelhante a Fosdem 2013 petra selmer flexible querying of graph data

Query Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesQuery Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data Sources
Jie Bao
 
Dsm as theory building
Dsm as theory buildingDsm as theory building
Dsm as theory building
ClarkTony
 
Dipso K Mi
Dipso K MiDipso K Mi
Dipso K Mi
msabou
 
Slides
SlidesSlides
Slides
butest
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
butest
 
Poggi analytics - star - 1a
Poggi   analytics - star - 1aPoggi   analytics - star - 1a
Poggi analytics - star - 1a
Gaston Liberman
 

Semelhante a Fosdem 2013 petra selmer flexible querying of graph data (20)

Query Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesQuery Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data Sources
 
Dsm as theory building
Dsm as theory buildingDsm as theory building
Dsm as theory building
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
 
Dipso K Mi
Dipso K MiDipso K Mi
Dipso K Mi
 
Adaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalAdaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrieval
 
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
 
Lecture1.pptx
Lecture1.pptxLecture1.pptx
Lecture1.pptx
 
Slides
SlidesSlides
Slides
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
 
OpenEdition Lab projects in Text Mining
OpenEdition Lab projects in Text MiningOpenEdition Lab projects in Text Mining
OpenEdition Lab projects in Text Mining
 
Poggi analytics - star - 1a
Poggi   analytics - star - 1aPoggi   analytics - star - 1a
Poggi analytics - star - 1a
 
Fol
FolFol
Fol
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...
 
Theory of computing
Theory of computingTheory of computing
Theory of computing
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
 
Gsdi10
Gsdi10Gsdi10
Gsdi10
 
Crash-course in Natural Language Processing
Crash-course in Natural Language ProcessingCrash-course in Natural Language Processing
Crash-course in Natural Language Processing
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Fosdem 2013 petra selmer flexible querying of graph data

  • 1. Flexible querying of graph data Graph processing room FOSDEM, 2 Feb 2013 Petra Selmer petra.selmer.uk@gmail.com http://www.dcs.bbk.ac.uk/~lselm01/
  • 2. Introduction  I shall be presenting my PhD topic which involves a declarative query language allowing for the flexible querying of graph-structured data with complex paths. 2
  • 3. Agenda  Who (am I)?  Why (the motivation)?  Some background info  What (is the query language and what can it do)?  Illustrative examples  How (is it done)? 3
  • 4. Who?  Petra Selmer  Part-time PhD student:  Birkbeck College, University of London  Prof. Alexandra Poulovassilis  Dr. Peter T. Wood  Software Architect:  University College London’s Institute of Neurology (Wellcome Trust Centre for Neuroimaging) 4
  • 5. Why?  Amount of graph-structured data is growing fast  The structure of this data is becoming more complex, especially when multiple, heterogeneous data sources are integrated together  The structure of the data is also always subject to change... 5
  • 6. Why?  Users of such systems may not be familiar with the underlying data structure: available paths etc  The user may not be able to obtain meaningful answers (or indeed, any answers) from the data IF the querying system is limited to exact matching of users’ queries  Also, the user may wish to explore the data by starting from a set of initial answers and proceeding from there  The user may additionally wish to derive some intelligence from the connections.... The data The query The user 6
  • 7. Background: Ontologies  Currently part of the Semantic Web stack (Tim Berners- Lee, RDF, triple stores)  Models a domain of interest: inferences, reasoning...  It can be thought of as a “schema” for graph data  The following inference rules are included (among others):  Subclass: ‘History’, ‘Languages’ are subclasses of ‘Humanities’  Subproperty, Domain, Range... 7
  • 8. What?  Data model: G = (V, E)  Very general model  V : vertices (or nodes); each labelled with some constant  E : directed, labelled edges; labels drawn from an alphabet {Ʃ U ‘type’}  The query language is called Flex-It (it is declarative)  The basis is that of conjunctive regular path queries  There are two operators which may be applied to the original query 8
  • 9. What?  Conjunctive regular path queries:  This is where the graph's paths to be traversed are expressed with a regular expression  A single regular path query conjunct: (X, R, Y)  X, Y: either constants or variables  R: the regular expression  “Conjunctive”: joining multiple conjuncts; e.g. (X, R1, Y), (Y, R2, Z), (Z, R3, A)  The Y’s are matched, the Z’s are matched etc 1) (N1, n+, ?Y): n n p • Y = N2, N3 N1 N2 N3 N4 2) (N1, n*p, ?Y): • Y = N4 9
  • 10. What?  Approximation allows for the approximate matching of labels in the path  An edit operation is applied to each edge label in the path denoted by the regular expression:  Edit operations: insertions, deletions, inversions, substitutions and transpositions of labels  Each operation has a ‘cost’: usually 1  Example:  Query conjunct: (X, a*.b, Y)  R = a*.b [answers returned at cost 0]  R’ = p.a*.b (insertion of ‘p’) [answers returned at cost 1]  R’’ = p.a*.b- (inversion of ‘b’) [answers returned at cost 2] 10
  • 11. What?  Relaxation is applied by using inference rules from an ontology (if one exists).  Achieved by applying logical relaxation of the query conditions using the data’s ontology definition  Relaxation operations: subclass, subproperty, domain and range  Each operation has a ‘cost’ – usually 1  Example:  We have an ontology:  Humanities (superclass)  Languages and History (subclasses of Humanities)  Assume our query states Languages may be relaxed  Languages is relaxed to Humanities:  Instances of Languages will be returned at cost 0  Instances of History will be returned at cost 1 11
  • 12. What?  Answers are ranked according to how closely they match the original query; higher-cost answers have a lower ranking  All answers at a certain distance d are ranked the same and returned before answers at a higher distance  We allow for incremental execution: exact answers returned first; then answers at distance 1; ... 12
  • 13. Example – ‘Lifelong learner metadata’ sc History 13
  • 15.  Query: “What work positions can I reach, having a degree in English”?  Y = the episode; Z = the job (?Y, ?Z)  (?X, type, University), (?X, qualif.type, EnglishStudies), (?X, prereq+, ?Y), (?Y, type, Work), (?Y, job.type, ?Z) 15
  • 16.  Query: “What work positions can I reach, having a degree in English”?  Y = the episode; Z = the job (?Y, ?Z)  (?X, type, University), (?X, qualif.type, EnglishStudies), (?X, prereq+, ?Y), (?Y, type, Work), (?Y, job.type, ?Z)  No results from User 2 will be returned...even though it is relevant! 16
  • 17.  Allowing query approximation can yield some answers:  Replacing the edge label prereq by next, at an edit cost of 1, we get this variant of the query: (?Y, ?Z)  (?X, type, University), (?X, qualif.type, EnglishStudies), APPROX(?X, prereq+, ?Y), (?Y, type, Work), (?Y, job.type, ?Z)  prereq+ can be approximated by next.prereq* at edit distance 1:  Result: Y = ep22, Z = AirTravelAssistant 17
  • 18.  Allowing query approximation can yield some answers:  Replacing the edge label prereq by next, at an edit cost of 1, we get this variant of the query: (?Y, ?Z)  (?X, type, University), (?X, qualif.type, EnglishStudies), APPROX(?X, prereq+, ?Y), (?Y, type, Work), (?Y, job.type, ?Z)  next.prereq* can be approximated by next.next.prereq*, now at edit distance 2:  Results:  Y = ep23, Z = Journalist  Y = ep24, Z = AssistantEditor 18
  • 19. sc History 19
  • 20. Query: “What jobs are open to me if I study English, or something similar, at University”? (?Y, ?Z)  (?X, type, University), (?X, qualif, ?D), RELAX (?D, type, EnglishStudies), APPROX (?X, prereq+, ?Y), (?Y, type, Work), (?Y, job.type, ?Z)  In addition to the answers (from User 2) obtained by the previous query, we now also have answers from the timeline of User 3  prereq+ can be approximated by next.prereq* (distance 1) and EnglishStudies can be relaxed – via Languages - to Humanities (distance 2), encompassing History  Result: Y = ep32, Z = PersonalAssistant (distance of 3 from original query) 20
  • 21. Query: “What jobs are open to me if I study English, or something similar, at University”? (?Y, ?Z)  (?X, type, University), (?X, qualif, ?D), RELAX (?D, type, EnglishStudies), APPROX (?X, prereq+, ?Y), (?Y, type, Work), (?Y, job.type, ?Z)  next.prereq* can be approximated by next.next.prereq* (distance 2), with EnglishStudies again relaxed to Humanities (distance 2)  Results: (both at distance 4 from the original query)  Y = ep33, Z = Author  Y = e34, Z = AssociateEditor 21
  • 22. How?  Theory  Construction of a weighted non-deterministic finite automaton (NFA) to represent the regular expression  We apply new states and transitions to the NFA to represent the approximation and relaxation operations  Formation of a product automaton: NFA with data graph G  We perform a lowest cost path traversal of the product automaton; construct query tree, do joins etc  Polynomial time complexity  Correctness of algorithms proven 22
  • 23. How?  Implementation of prototype  Graph database: DEX (http://www.sparsity- technologies.com/dex)  Programming language: C#  Further work  New flexible operation combining APPROX and RELAX  FLEX  Optimisation! 23
  • 24. Any questions? Thank you for your attention! petra.selmer.uk@gmail.com 24