O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 22 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Quem viu também gostou (20)

Anúncio

Semelhante a Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples (20)

Mais de semanticsconference (20)

Anúncio

Mais recentes (20)

Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

  1. 1. © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Stephen Buxton, Senior Director, Product Management, MarkLogic When to Use Documents vs Triples
  2. 2. SLIDE: 2 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. NoSQL KEY- VALUE COLUMN DOCUMENT GRAPH PROPERTY GRAPHS TRIPLE STORES NoSQL
  3. 3. SLIDE: 3 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. NoSQL KEY- VALUE COLUMN DOCUMENT GRAPH PROPERTY GRAPHS TRIPLE STORES NoSQL A Database That Integrates Data Better, Faster, with Less Cost
  4. 4. SLIDE: 4 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Leading Organizations Using MarkLogic Semantics  Intelligent Search  Semantic Metadata Hub  Dynamic Semantic Publishing  Recommendation Engines  Compliance Entertainment Company Pharmaceutical Company
  5. 5. SLIDE: 5 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Relational Databases Table PROs  Natural way to model strictly-tabular data  Mature technology with rich eco-system Ph_ID Cus_ID Type Number 4001 2001 Home 555-6789 4002 2001 Cell 555-7238 4003 2002 Home 137-2859 4004 2003 Home 189-2212 4005 2003 Cell 199-2312 4006 2003 Office 444-1898 4007 2003 Main 199-2312 CONs  Real-world entities require complex modeling up-front  Brittle: changes require adding columns and tables  No inherent semantics
  6. 6. SLIDE: 6 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Document Databases Document PROs  Natural way to model entities  Schema is flexible within/across documents  Self-describing  Query and Search immediately  Handles hierarchical data  Handles repeating elements  Handles sparse data  Joins can be denormalized away { “ID” : 1001 , “Fname” : “Paul” , “Lname” : “Jackson” , “Phone” : “415-555-1212” , “SSN” : “123-45-6789” , “Addr” : “123 Avenue Road” , “City” : “San Francisco” , “State” : “CA” , “Zip” : 94111 }
  7. 7. SLIDE: 7 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Graph Databases – Triple Stores Graph PROs  A triple defines a relationship  Entity->Entity  Entity->Concept  Concept->Value  Triples come together to form Graphs  Graphs can be easily shared, combined  Graphs can be traversed  Can infer new triples using definitions (rules)
  8. 8. SLIDE: 8 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Hybrid Documents + Triple Store Hybrid PROs  PROs of a Document Store  PROs of a Triple Store  Combination: Documents with Semantic context  Define the semantics of your data  Richer search through context and facts  Combination: Triples with Document context  Arbitrary annotation of Triples  Metadata, provenance, temporal, etc.  Rich queries over rich data  Fast, iterative development  Query through a SQL lens where appropriate { “ID” : 1001 , “Fname” : “Paul” , “Lname” : “Jackson” , “Phone” : “415-555-1212” , “SSN” : “123-45-6789” , “Addr” : “123 Avenue Road” , “City” : “San Francisco” , “State” : “CA” , “Zip” : 94111 }
  9. 9. SLIDE: 9 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Sidetrack – Documents and Data Title Date Body Section Section Section Article Abstract Paragraph Paragraph Paragraph Type Date Parties Seller Buyer Channel Trade Amount PaidBy Affiliation Name
  10. 10. TRIPLES AND DOCUMENTS
  11. 11. SLIDE: 11 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Triples Alongside Documents User1 rank Senior Manager Geneva basedIn Compliance Officer role High risk personApp1 runsOn Cluster1 TopSecret requires Database1 accesses runs
  12. 12. SLIDE: 12 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Show me documents that mention App1 (or its dependencies)  … and "trades" or "markets"  … that were valid yesterday afternoon  … that were produced near HQ  see Intelligent Search, Infobox  Show me instructions to access App1  App1 user guide  How to get TopSecret access  Scope of Database1  see Dynamic Semantic Publishing Triples Alongside Documents
  13. 13. SLIDE: 13 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Documents as Part of the Graph User1 rank Senior Manager Geneva basedIn Compliance Officer role Hig h risk pers on App1 runsOn Cluster1 TopSecret requires Database1 accesses runs deep dive license user guide tutorialMovie order
  14. 14. SLIDE: 14 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.  Document as opaque object  Show me all the instructional documents related to App1  Search inside the document  Show me all the applications that managers use that expire in the next 6 months Documents as Part of the Graph
  15. 15. SLIDE: 15 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Triples About Documents – Extended Metadata User1 rank Senior Manager Geneva basedIn Compliance Officer role Hig h risk pers on App1 runsOn Cluster1 TopSecret requires Database1 accesses runs order format JSON English Delaware 2016-12-31 jurisdiction expires Ts and Cs language
  16. 16. SLIDE: 16 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.  Triples are a natural way to represent metadata about documents  Extended because that metadata is part of the graph  Example: show me all orders for a TopSecret app that will expire soon Triples About Documents – Extended Metadata
  17. 17. SLIDE: 19 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.  Data Integration: Dirty data  Show me license documents from vendor Acme  Data Integration: Overlapping data  Show me all assets from vendor Acme Triples About Documents
  18. 18. SLIDE: 23 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Triples as part of a document  Embed triples in a document  Triples and document have the same security, transactions, backup, temporality, …  Annotate triples in an entirely generic way (XML or JSON)  Provenance  Confidence  Bitemporal  Query across triples and documents in the same query  SPARQL, restrict result to some source, confidence range, bitemporal range  Search, restrict result to documents that contain some facts or metadata
  19. 19. SUMMARY
  20. 20. SLIDE: 25 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Use Triples when you want to … Data Documents Triples  Store and query hundreds of billions of facts and relationships  Explore a graph  Visualize a graph  Leverage standards: data + query  Infer new information  better insights  simpler data modeling  Semantics of data  integration
  21. 21. SLIDE: 26 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Use Documents when you want to … Data Documents Triples  Easily store heterogeneous data (transactional data, records, free-text)  Schema-agnostic  modeling freedom  integrate without ETL*  Search flexibility and specificity  Fast app development
  22. 22. SLIDE: 27 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Document Store and Triple Store Combined Data Documents Triples All the benefits of each, plus:  Docs can contain triples, Triples can annotate docs, Graphs can contain docs – Faster data integration using semantics as the glue – Ideal model for reference data, metadata, provenance – Ability to run really powerful queries  Massive speed and scale  Simplicity of a single unified platform  Enterprise features (security, HA/DR, ACID transactions,…)

Notas do Editor

  • ?How do semantic partners differ from ML?
    ^^^add analytic animation here
    ML triple stores enable & complement many diff analytics
    Mach learn + NLP > entity extract > store in ML
    Triples & inferencing > create knowledge graphs & support data mining & analytics

    Let’s talk just a bit more about terminology as the semantics of semantics can get messy…

    From our perspective, MarkLogic is a document oriented database. It also has a built-in triple store. It also serves as a precursor to other capabilities.
    A triple store can serve as a knowledge graph with facts and relationships about billions of things.
    You can also mine data in MarkLogic to do predictive analytics.
    MarkLogic also works with partners such as SmartLogic to do entity extraction. When you take free text and extract facts about people, places, and things, the unstructured data becomes machine readable.
    This is the basis for natural language processing, or NLP.
    NLP, machine learning, and pattern recognition fall into the realm of cognitive computing, and all of this is encroaching into the territory of AI.
  • You guys recognize these logos?
    OF COURSE! Everybody knows “ENTERTAINMENT COMPANY”
    Actually, the entertainment companies that are our customers, and those we hope will honor us by becoming customers,
    …are doing or looking to do similar things with semantics, albeit in their own way…so we talk about them as a group in this presentation.

    As David Gorbet said in his keynote, we’re helping enterprises
    See their entire product and customers through intelligence search
    Understand their production and distribution processes through a Semantic Metadata Hub
    Deliver customized, targeted content and user experience through Dynamic Semantic Publishing
    Maintain a valuable engagement with customers through semantically driven Recommendations
    AND assess business risks by leveraging semantics for Compliance


  • The goal of this first session is to de-mystify "Semantics".
    Some people are put off by the notion that semantics is somehow magical, or at least enormously complicated.
    Semantic technologies can be very powerful, but there's no magic here – just science.
    By a show of fingers – where 0 is "I don’t know what a triple is" and 10 is "I have a PhD in Ontologies" – how much do you know about Semantic technologies?
    I see some 7s and 8s – OK, I'll go through this section quite quickly.
    For the 1s and 2s, I'll define some basic terms and give you a general idea of what we mean by Semantics in this context.

  • BBC – DSP
    BSI – Semantic Search
    InfoBox
  • There’s a lot you can do with JUST triples, but the real magic of semantics comes when you use triples alongside documents.
    Here, you can see a document as Marklogic sees it. A document is stored as XML or JSON, which is a hierarchical tree format.
    Documents are schema-agnostic, human-readable, and don’t carry all of the entity integrity constraints that you had with a relational model.
    You can do a lot with documents, but even documents can fall short when it comes to optimizing for facts and relationships, which are best stored in a graph model as triples.
    Here, you can see a graph formed by triples about a particular video title.
    In this graph we know that a title not only has this metadata, but the title has characters, etc.
    With a single query, you can bring back the document, parts of the graph, or both and have it all materialize at runtime.
    This “multi-model” view of data provides more flexibility and agility than any other model.


    If you want to go into more detail about how semantics helps with classification
  • Mix'n'Match documents and triples
  • There’s a lot you can do with JUST triples, but the real magic of semantics comes when you use triples alongside documents.
    Here, you can see a document as Marklogic sees it. A document is stored as XML or JSON, which is a hierarchical tree format.
    Documents are schema-agnostic, human-readable, and don’t carry all of the entity integrity constraints that you had with a relational model.
    You can do a lot with documents, but even documents can fall short when it comes to optimizing for facts and relationships, which are best stored in a graph model as triples.
    Here, you can see a graph formed by triples about a particular video title.
    In this graph we know that a title not only has this metadata, but the title has characters, etc.
    With a single query, you can bring back the document, parts of the graph, or both and have it all materialize at runtime.
    This “multi-model” view of data provides more flexibility and agility than any other model.


    If you want to go into more detail about how semantics helps with classification
  • There’s a lot you can do with JUST triples, but the real magic of semantics comes when you use triples alongside documents.
    Here, you can see a document as Marklogic sees it. A document is stored as XML or JSON, which is a hierarchical tree format.
    Documents are schema-agnostic, human-readable, and don’t carry all of the entity integrity constraints that you had with a relational model.
    You can do a lot with documents, but even documents can fall short when it comes to optimizing for facts and relationships, which are best stored in a graph model as triples.
    Here, you can see a graph formed by triples about a particular video title.
    In this graph we know that a title not only has this metadata, but the title has characters, etc.
    With a single query, you can bring back the document, parts of the graph, or both and have it all materialize at runtime.
    This “multi-model” view of data provides more flexibility and agility than any other model.


    If you want to go into more detail about how semantics helps with classification
  • There’s a lot you can do with JUST triples, but the real magic of semantics comes when you use triples alongside documents.
    Here, you can see a document as Marklogic sees it. A document is stored as XML or JSON, which is a hierarchical tree format.
    Documents are schema-agnostic, human-readable, and don’t carry all of the entity integrity constraints that you had with a relational model.
    You can do a lot with documents, but even documents can fall short when it comes to optimizing for facts and relationships, which are best stored in a graph model as triples.
    Here, you can see a graph formed by triples about a particular video title.
    In this graph we know that a title not only has this metadata, but the title has characters, etc.
    With a single query, you can bring back the document, parts of the graph, or both and have it all materialize at runtime.
    This “multi-model” view of data provides more flexibility and agility than any other model.


    If you want to go into more detail about how semantics helps with classification
  • There’s a lot you can do with JUST triples, but the real magic of semantics comes when you use triples alongside documents.
    Here, you can see a document as Marklogic sees it. A document is stored as XML or JSON, which is a hierarchical tree format.
    Documents are schema-agnostic, human-readable, and don’t carry all of the entity integrity constraints that you had with a relational model.
    You can do a lot with documents, but even documents can fall short when it comes to optimizing for facts and relationships, which are best stored in a graph model as triples.
    Here, you can see a graph formed by triples about a particular video title.
    In this graph we know that a title not only has this metadata, but the title has characters, etc.
    With a single query, you can bring back the document, parts of the graph, or both and have it all materialize at runtime.
    This “multi-model” view of data provides more flexibility and agility than any other model.


    If you want to go into more detail about how semantics helps with classification
  • We can think of "Order", "Order associated with App1" as metadata; "extended" because we can extend the link from App1 to "application that requires TopSecret".
  • International Classification of Diseases, a set of codes used by physicians, hospitals, and allied health workers to indicate diagnosis for all patient encounters.
  • International Classification of Diseases, a set of codes used by physicians, hospitals, and allied health workers to indicate diagnosis for all patient encounters.
  • The goal of this first session is to de-mystify "Semantics".
    Some people are put off by the notion that semantics is somehow magical, or at least enormously complicated.
    Semantic technologies can be very powerful, but there's no magic here – just science.
    By a show of fingers – where 0 is "I don’t know what a triple is" and 10 is "I have a PhD in Ontologies" – how much do you know about Semantic technologies?
    I see some 7s and 8s – OK, I'll go through this section quite quickly.
    For the 1s and 2s, I'll define some basic terms and give you a general idea of what we mean by Semantics in this context.

×