O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs

Carregando em…3

Confira estes a seguir

1 de 114 Anúncio

Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs

Tutorial on "Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs" presented at the 4th Joint International Conference on Semantic Technologies (JIST2014)

Tutorial on "Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs" presented at the 4th Joint International Conference on Semantic Technologies (JIST2014)


Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs (20)


Mais recentes (20)

Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs

  1. 1. JIST2014  Tutorial  on     Linked  Data  and  Knowledge  Graphs   -­‐  ConstrucAng  and  Understanding  Knowledge  Graphs     Presenter   Jeff  Z.  Pan  (University  of    Aberdeen)     Contributors   Honghan  Wu  (University  of    Aberdeen)   Yuan  Ren  (University  of    Aberdeen)   Panos  Alexopoulos  (iSOCO)  
  2. 2. Jeff  Z.  Pan  (University  of    Aberdeen)   Agenda       Overview  &  ApplicaAons   1:00pm  –   1:20pm   1:35pm  –   1:45pm   The  Current  Status  of  Linked  Data:  the  Good,  the  Bad  and   the  Ugly   1:20pm  –   1:35pm   Example  Linked  Data  Knowledge  Repositories     PART  I  LINKED  DATA  &  KNOWLEDGE  GRAPHS   1:45pm  –   2:00pm   Research  Challenges   2
  3. 3. Jeff  Z.  Pan  (University  of    Aberdeen)   Agenda       ConstrucAng  Knowledge  Graphs   2:00pm  –   3:05pm   3:05pm  –   3:40pm   Understanding  Knowledge  Graphs   2:30pm  –   2:45pm   Coffee  Break   PART  II  METHODS  &  TECHNIQUES   3:40pm  –   3:45pm   Outlook   3
  4. 4. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Overview   •  ApplicaLons   •  Linked  Data  Knowledge  Repositories   •  Knowledge  Graph  on  Linked  Data   •  Research  Challenges   PART  I     LINKED  DATA  &  KNOWLEDGE  GRAPHS   4
  5. 5. Jeff  Z.  Pan  (University  of    Aberdeen)   Knowledge   •  What  is  knowledge?   •  Something  is  known   •  Structured  informaLon     •  About  certain  aspects  of   the  (real)  world     5
  6. 6. Jeff  Z.  Pan  (University  of    Aberdeen)   Semantic Networks A semantic network is a graph   structure  for  represenLng   knowledge  in  paSerns  of   interconnected  nodes  and  arcs. •  with nodes representing objects, concepts, or situations, and •  arcs representing relationships 6
  7. 7. Jeff  Z.  Pan  (University  of    Aberdeen)   RDF: Standard for Directed Labelled Graph KBs for the Web •  RDF is •  a modern version of semantic network, with formal syntax and semantics •  a  standard  model  for  data  interchange  on  the   Web •  RDF statements: Subject-property-value triples [my-­‐chair  colour  tan  .]   [my-­‐chair  rdf:type  chair  .]   [chair  rdfs:subClassOf  furniture  .]   7
  8. 8. Jeff  Z.  Pan  (University  of    Aberdeen)   Linked  Data  and  Knowledge  Graphs   • Linked  Data  refers  to  (RDF)  data  published  on   the  web   •  with  its  meaning  explicitly  defined  with  ontological   (OWL)  vocabulary   •  can  be  inter-­‐linked  with  external  datasets   • A  knowledge  graph  is  a  set  of  interconnected   typed  enLLes  and  their  aSributes   8
  9. 9. Jeff  Z.  Pan  (University  of    Aberdeen)   Knowledge  Graph  (KG)  Services  and   Related  Research  Problems   •  KG  construcLon:  how  to  construct  high  quality   knowledge  graphs?   •  Knowledge  aquaciLon     •  Knowledge  evaluaLon   •  KG  understanding:  how  to  make  it  easier  to  access  and   reuse  knowledge?   •  for  end  users   •  for  data  engineers   •  KG  reasoning:  how  to  bridge  the  gap  between   vocabulary  used  in  the  graphs  and  those  used  in  qeuries   •  Scalability     •  Efficiency   9
  10. 10. Jeff  Z.  Pan  (University  of    Aberdeen)   APPLICATIONS  OF     KNOWLEDGE  GRAPHS   Summary of entities, Faceted fact, From best to list, EntityAssociations, Structured Queries, and QuestionAnswering 10
  11. 11. Jeff  Z.  Pan  (University  of    Aberdeen)   ENTITY  UNDERSTANDING:   THINGS,  NOT  STRINGS   11
  12. 12. Jeff  Z.  Pan  (University  of    Aberdeen)   What  is  it?  (EnAty  Understanding)   12
  13. 13. Jeff  Z.  Pan  (University  of    Aberdeen)   FACETED  FACT:   GETTING  THE  VALUE  OF  SOME   ATTRIBUTE   13
  14. 14. Jeff  Z.  Pan  (University  of    Aberdeen)   What  is  the  Ame  there?  (Faceted  Fact)   14
  15. 15. Jeff  Z.  Pan  (University  of    Aberdeen)   FROM  BEST  TO  LIST:   NOT  ONLY  THE  BEST   15
  16. 16. Jeff  Z.  Pan  (University  of    Aberdeen)   Give  a  List  instead  of  Best   16
  17. 17. Jeff  Z.  Pan  (University  of    Aberdeen)   ENTITY  ASSOCIATION:   SHOW  THE  CONNECTIONS   17
  18. 18. Jeff  Z.  Pan  (University  of    Aberdeen)   How  are  they  connected?  (EnAty  AssociaAon)   Gong Cheng,Yanan Zhang, andYuzhong Qu. Explass: ExploringAssociations between Entities viaTop-K Ontological Patterns and Facets. In Proc. Of ISWC 2014, pp. 422–437. http://ws.nju.edu.cn/explass/ 18
  19. 19. Jeff  Z.  Pan  (University  of    Aberdeen)   STRUCTURED  QUERIES:   EVEN  WHEN  THE  INPUTS  ARE   KEYWORDS   19
  20. 20. Jeff  Z.  Pan  (University  of    Aberdeen)   From  keywords  to  structural  queries   Wang, Haofen, Kang Zhang, Qiaoling Liu,ThanhTran, andYongYu. Q2semantic:A lightweight keyword interface to semantic search. In Proc. Of ESWC 2008, pp 584-598. “Capin SVG” find specifications about“SVG”whose author’s name is“Capin” 20
  21. 21. Jeff  Z.  Pan  (University  of    Aberdeen)   QUESTION  ANSWERING:    COMPUTE  ANSWERS  WITH  THE  KG   21
  22. 22. Jeff  Z.  Pan  (University  of    Aberdeen)   QuesAon  Answering   Christina Unger, Lorenz Bühmann, Jens Lehmann,Axel-Cyrille Ngonga Ngomo, Daniel Gerber, and Philipp Cimiano. "Template-based question answering over RDF data." In Proceedings of the 21st international conference onWorldWideWeb, pp. 639-648.ACM, 2012. “films starring Brad Pitt” 22
  23. 23. Jeff  Z.  Pan  (University  of    Aberdeen)   SAMPLE  LINKED  DATA  KNOWLEDGE   REPOSITORIES   DBpedia,WikiData, GoodRelation 23
  24. 24. Jeff  Z.  Pan  (University  of    Aberdeen)   DBpedia   •  A  crowd-­‐sourced  community  effort  to  extract   structured  informaLon  from  Wikipedia   •  allows  to  ask  structured  queries  against   Wikipedia   •  and  to  link  the  different  data  sets  on  the  Web   to  Wikipedia  data.     24
  25. 25. Jeff  Z.  Pan  (University  of    Aberdeen)   DBpedia  –  the  content   Entities and their attributes from Wikipedia infobox templates, categorisation information, images, geo- coordinates, etc Classification Schemas •  Wikipedia Categories are represented using the SKOS vocabulary and DCMI terms. •  YAGO Classification is derived from the Wikipedia category system using Word Net. •  Word Net Synset Links were generated by manually relating Wikipedia infobox templates and Word Net synsets DBpedia 2014 release consists of 3 billion RDF triples 25
  26. 26. Jeff  Z.  Pan  (University  of    Aberdeen)   DBpedia  –  services   http://dbpedia.org/sparql Query Builders (e.g. Leipzig query builder at http://querybuilder.dbpedia.org) Public Faceted Web Service Interface Dump Downloads •  DBpedia dumps in 125 languages at DBpedia download server. •  DBpedia Ontology 26
  27. 27. Jeff  Z.  Pan  (University  of    Aberdeen)   DBpedia  –  use  cases   Nucleus for the Web of Data Revolutionise Access to Wikipedia information “Give me all cities in New Jersey with more than 10,000 inhabitants” 27
  28. 28. Jeff  Z.  Pan  (University  of    Aberdeen)   WikiData   •  A  collaboraAvely  edited  knowledge  base   operated  by  the  Wikimedia  FoundaLon.       •  Can  be  read  and  edited  by  both  humans  and   machines.   •  Acts  as  central  storage  for  the  structured   data  of  its  Wikimedia  sister  projects  including   Wikipedia,  Wikivoyage,  Wikisource,  and   others   28
  29. 29. Jeff  Z.  Pan  (University  of    Aberdeen)   WikiData  –  the  content   Wikidata is a document- oriented, focused around topics. •  Information is added to items by creating statements (key-value pairs) 29
  30. 30. Jeff  Z.  Pan  (University  of    Aberdeen)   WikiData  -­‐  to  Linked  Data  Web  (1)   Fredo Erxleben, Michael G¨unther, Markus Kr¨otzsch, Julian Mendez, and DennyVrandeˇci´. IntroducingWikidata to the Linked DataWeb. In Proc. Of ISWC 2014, pp. 50-65. Exporting Statements as Triples •  Faithful representations: with additional quantifiers and references •  Simplified representations: without additional quantifiers and references 30
  31. 31. Jeff  Z.  Pan  (University  of    Aberdeen)   WikiData  -­‐  to  Linked  Data  Web  (2)   Fredo Erxleben, Michael G¨unther, Markus Kr¨otzsch, Julian Mendez, and DennyVrandeˇci´. IntroducingWikidata to the Linked DataWeb. In Proc. Of ISWC 2014, pp. 50-65. Extracting Schema Information from Wikidata •  instance of (P31) → rdf:type and subclass of (P279) → rdfs:subClassOf •  constraints for the use of properties → OWL Axioms 31
  32. 32. Jeff  Z.  Pan  (University  of    Aberdeen)   WikiData  –  use  case  &  data  access   Use Cases •  Information about the sources helps support the notion of verifiability •  Collecting structured data: allow easy reuse of that data •  Support for Wikimedia projects: reducing the workload in Wikipedia and increasing its quality •  Support well beyond that. Everyone can use Wikidata Accessing the data •  Mediawiki Lua Scribunto interface •  Wikibase/API •  RDF Dumphttp://tools.wmflabs.org/wikidata-exports/rdf/exports/20141013/ 32
  33. 33. Jeff  Z.  Pan  (University  of    Aberdeen)   GoodRelaAons   GoodRelations is a lightweight ontology for annotating offerings and other aspects of e-commerce on the Web. [Slide  credit:    MarLn  Hepp]   33
  34. 34. Jeff  Z.  Pan  (University  of    Aberdeen)   GoodRelaAons  –  use  cases   [Slide  credit:    MarLn  Hepp]   34
  35. 35. Jeff  Z.  Pan  (University  of    Aberdeen)   GoodRelaAons  –  use  cases(2)   35 Google, Bing, Yahoo, and Yandex will improve the rendering of your page directly in the search results Rich Snippets:Search engines use your markup  to augment the preview of your site Targeted Searching:profile and preferences of the person behind the query
  36. 36. Jeff  Z.  Pan  (University  of    Aberdeen)   GoodRelaAons  –  who  are  using   36 Search Engines and 10,000+ small and large shops Publishers Software OpenLink (Virtuoso)
  37. 37. Jeff  Z.  Pan  (University  of    Aberdeen)   CURRENT  STATUS  OF  ONLINE   LINKED  DATA   The good, the bad and the ugly 37
  38. 38. Jeff  Z.  Pan  (University  of    Aberdeen)   The  Good   Ontology  Mapping   Data  linkage   RDF  /  OWL   Querying  and  reasoning  techniques   -­‐   Flexible    schema  sebng   -­‐   schemaless  -­‐>  simple   schema  -­‐>  rich  schema   -­‐  Universal  Unique  ID  for  data  enLLes:  URI   -­‐  Shared  vocabularies   -­‐  Schema  mapping   -­‐   Instance  mapping   -­‐   SPARQL  entailment  regimes   -­‐   DisLrbuted  SPARQL  endpoints   38 Flexible  linked  data  eco-­‐system  
  39. 39. Jeff  Z.  Pan  (University  of    Aberdeen)   The  Good   • Flexible  linked  data  eco-­‐system   • FaciliLes  of  sharing  and  linking  knowledge  in   open  environment   • Knowledge  representaLon:  various  levels  of   expressive  power   • Services,  tools,  and  approaches  for  knowledge   generaLon,  understanding,  and  consuming   • Interlinked  knowledge  repositories  across   various  domains     39
  40. 40. Jeff  Z.  Pan  (University  of    Aberdeen)   The  Bad   • Knowledge  Quality  (errors,  provenance,   quanLfier,  freshness…)   • Data  protecLon  (license,  access  control)   • Data  business  model   40
  41. 41. Jeff  Z.  Pan  (University  of    Aberdeen)   The  Ugly   • Excel  in  knowledge  representaLon   •  But,  a  large  amount  of  datasets  missing  schema   informaLon       • RDF  is  triple  based  model   •  But,  it  is  hard  and  Lme-­‐consuming  (even  for  SW   geeks)  to  understand  a  RDF  knowledge  repository   41
  42. 42. Jeff  Z.  Pan  (University  of    Aberdeen)   RESEARCH  CHALLENGES   42
  43. 43. Jeff  Z.  Pan  (University  of    Aberdeen)   Research  Challenges   •  KG Construction •  Ontology / Schema Construction •  Data Lifting •  Quality Evaluation •  Understanding KG •  User Understanding •  Data Understanding •  Dynamic Knowledge in KG •  Stream Data / Prediction •  Belief Revision •  Intelligent Services for KG •  Ontology Reasoning (see my tutorial at ISWC2014) •  Problem Solving / Workflow 43
  44. 44. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Incompleteness of data: is the constructed schema generic enough to accommodate new data? •  Inconsistency of data: what if data conflicts with each other? e.g. Birthdate of people: some people may not have birthdate asserted in the dataset, should the schema specify that each people has a birthdate? Some people may have different birthdates asserted in different datasets, should the schema specify that birthdate is unique? Challenges  in  AutomaAc  ConstrucAon   44
  45. 45. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Expertise of ontology engineers: do the engineers have sufficient understanding and experience of ontology technologies (RDF(S), OWL, SPARQL, RIF, etc…) •  Workload of ontology engineers: how much time does it take to manually construct a large ontology? E.g. SNOMED CT has about 400,000 concepts •  Collaboration: when multiple ontology engineers work together, how to make sure they have consistent understanding of the ontology? Challenges  in  Mannual  ConstrucAon   45
  46. 46. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Requirement and evaluation: how to specify the requirement of ontology construction and test if the requirements have been fulfilled? •  Expressiveness v.s. Efficiency: which knowledge representation should we use? Is it sufficient to describe the domain? Is there efficient reasoning and query answering mechanism and system available? •  Ontology reuse: do we have to construct everything from scratch? Is there ontology available covering partially the domain? Challenges  for  both  AutomaAc/Mannual   ConstrucAons   46
  47. 47. Jeff  Z.  Pan  (University  of    Aberdeen)   Key challenges: •  Entity identification: certain entities can be hard to identify, e.g. movie titles •  AVP (attribute-value pair) identification: an entity, attribute and its value may scattered across the text or dataset, making it hard to establish the relation Challenge  in  Data  Liding   Data Lifting enrichs unstructured data with structural annotations, therefore extract the entities and their relations, properties for knowledge graph 47
  48. 48. Jeff  Z.  Pan  (University  of    Aberdeen)   Challenge  in  EnAty  IdenAficaAon     •  There different ways to identify entities: e.g. “The President of the U.S.” and “Barak Obama” •  The same name can be referring to different entities •  People may use acronym or abbreviation for entities: e.g. “K-Drive” is the acronym for “Knowledge-driven Data Exploitation” project instead of the drive labelled K in my computer. •  Natural language text may have typos, values may use different notations 48
  49. 49. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Users are unfamiliar with the content of knowledge graphs: •  What is the vocabulary? •  What is described by the knowledge graph? •  How is the content organised? •  How is it connected to the other datasets I have? •  Users do not know how to exploit the knowledge graph: •  Which query can I ask this knowledge graph? •  Which query can be answered with this knowledge graph? Challenge  in  Data  Understanding   49
  50. 50. Jeff  Z.  Pan  (University  of    Aberdeen)   Challenge  in  Knowledge  Dynamics   •  Validity of knowledge: is a piece of information permanent or temporary? •  Representation: e.g. to represent the temporal dependency of knowledge, e.g. “George W. Bush was the president of the U.S. until Barak Obama became the president.” •  Updating of knowledge graph: When and how do we retract a previously unknown mistake from the knowledge graph? Which knowledge should become obsolete after the current update? •  Querying: to query w.r.t. the temporal properties of knowledge, e.g. “Who was the last president of the U.S.?” •  Predicting the dynamics: which change is likely to occur given the history of the knowledge graph? 50
  51. 51. Jeff  Z.  Pan  (University  of    Aberdeen)   Key challenges •  Efficiency of the services: knowledge graphs are usually accessed by multiple users in real-time. Efficiency is crucial to the quality of service. •  Scalability of the services: knowledge graphs are usually of large scale while basic reasoning services, e.g. transitive closure, can already consume large amount of time and resources. Challenge  in  Intelligent  Services   The large amount of information and their inter-connection in a knowledge graph can be used to provide intelligent services; e.g. reasoning can be used to discover hidden relations in a knowledge graph 51
  52. 52. Jeff  Z.  Pan  (University  of    Aberdeen)   Agenda       ConstrucAng  Knowledge  Graphs   2:00pm  –   3:05pm   3:05pm  –   3:40pm   Understanding  Knowledge  Graphs   2:30pm  –   2:45pm   Coffee  Break   PART  II  METHODS  &  TECHNIQUES   3:40pm  –   3:45pm   Outlook   52
  53. 53. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Test Driven Ontology Construction •  Methodology •  A Protégé plug-in •  Handling Entity DisambiguaLon   •  Approach •  Some evaluation result •  Briding Requirements and Authoring Tests •  Competency Questions as Informal Requirement Specification •  Some evaluation results CONSTRUCTING  KNOWLEDGE  GRAPHS   53
  54. 54. Jeff  Z.  Pan  (University  of    Aberdeen)   Uschold  &  King’s  (1995)  Methodology  on   Ontology  ConstrucAon   •  Key steps: capturing, coding, integrating and evaluating/testing •  Ontology evaluation/testing: •  to make a technical judgment of the ontologies •  w.r.t. to a frame of reference •  A frame of reference can be: •  requirement specifications •  competency questions •  or, the real world 54 54
  55. 55. Jeff  Z.  Pan  (University  of    Aberdeen)   Ontology and Tests •  Uschold & King’s methodology •  Test ontology after axioms are written •  Test-driven ontology authoring •  Write authoring tests before writing axioms •  Writing authoring tests before axioms does not take any more efforts than writing them after axioms •  Force authors to think about requirements before writing axioms •  Writing authoring tests first will help authors to detect and remove errors sooner •  Understand how good is a(n) existing/reused ontology 55 55
  56. 56. Jeff  Z.  Pan  (University  of    Aberdeen)   Gruninger  &  Fox’s  (1995)  Methodology   Key steps: 1.  Motivating Scenarios 2.  Informal competency questions 3.  FOL terminology (classes, properties, objects) 4.  Formal competency questions (2 -> 4?) 5.  FOL axioms 6.  Completeness theorem (defining the conditions under which the solutions to the questions are complete) 56 56
  57. 57. Jeff  Z.  Pan  (University  of    Aberdeen)   The  METHONTOLOGY  (2003)  Methodology   •  Key steps: 1.  specification of requirements 2.  terminology with tabular and/or graph notations 3.  formalisation with logic based ontology language 4.  maintenance (including evaluation/testing) •  Ontology evaluation/testing: •  checking consistency, completeness, redundancy 57 57
  58. 58. Jeff  Z.  Pan  (University  of    Aberdeen)   The  DKAP  (2007)  Methodology   •  Key steps: 1.  determine the domain and scope 2.  check availability of existing ontologies 3.  collect and analyse data for knowledge extraction 4.  develop initial ontology 5.  refine and validate ontology •  Ontology Validation/testing: •  consistency and accuracy checking 58 58
  59. 59. Jeff  Z.  Pan  (University  of    Aberdeen)   LimitaAons  of  ExisAng  Methodologies   •  Methodology level: •  Lack of details about the transitions • from requirement to tests • from requirements to terminology • form terminology to axioms •  Tool level: •  lack of tools to guide the above transitions 59 59
  60. 60. Jeff  Z.  Pan  (University  of    Aberdeen)   An approach to Test-­‐Driven  Ontology   Authoring  (presented  in  an  invited  talk  at   BMIR,  Stanford  University,  June  2013) •  An ontology contains not only OWL files, but also a test suit •  A test suit contains a set of tests as SPARQL 1.1 queries •  not all requirements can be represented in SPARQL 1.1 though •  Ontology reuse •  check the associated test suit before ontology reuse, to better understand the original intention •  Collaborative ontology authoring •  all authors agree upon a common test suit •  each author can have their an extra test suit locally 60 60
  61. 61. Jeff  Z.  Pan  (University  of    Aberdeen)   Authoring  Tests   Test  Suite   Test  1   Test  2   …   Query   Expected   results   Ontology   Actual   results   Pass/ fail  reasoner   SPARQL   1.1   61
  62. 62. Jeff  Z.  Pan  (University  of    Aberdeen)   A  Protégé  Plug-­‐in  for  Authoring  Tests   (based  on  the  TrOWL  reasoner)   62 62
  63. 63. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Clicking  on  a  test  to   show  the  expected   and  actual  results   Loading  the  Manifest  File   •  A  manifest  file   specifies  queries  and   expected  results   •  Running  reasoner  to   get  the  results  for   each  test   63 63
  64. 64. Jeff  Z.  Pan  (University  of    Aberdeen)   Compute  JusAficaAons  for  Errors  Related  to   Failed  Tests   • with  the  jusLficaLon  plug-­‐in  (and  reasoners,   such  as  TrOWL)   64 64
  65. 65. Jeff  Z.  Pan  (University  of    Aberdeen)   Modify  the  Ontology       • so  that  CheeseTopping  no  longer  disjoint  with   VegetableTopping   65 65
  66. 66. Jeff  Z.  Pan  (University  of    Aberdeen)   Key  Issue  (to  be  revisited  ader  the  EnAty   DisambiguaAon  part)   • Understanding  the  intension  of  ontology  authors   •  How  to  generate  authoring  tests?   •  How  to  judge  the  quality  of  the  authoring  tests?   66 66
  67. 67. Jeff  Z.  Pan  (University  of    Aberdeen)   EnAty  RecogniAon  and  DisambiguaAon       •  Challenge  revisit:   •  There different ways to identify entities: e.g. “The President of the U.S.” and “Barak Obama” •  The same name can be referring to different entities •  Contextual  hypothesis  used in many existing aproaches •  terms  with  similar  meanings  are  oien  used  in  similar   contexts   •  The  role  of  these  contexts  is  typically  played  by  already   annotated  documents  (e.g.  wikipedia  arLcles)  which   are  used  to  train  term  classifiers 67 67
  68. 68. Jeff  Z.  Pan  (University  of    Aberdeen)   AlternaAve  Context:  Evidence  Model   •   Idea: semantic entities that may serve as disambiguation evidence for the scenario’s target entities 68
  69. 69. Jeff  Z.  Pan  (University  of    Aberdeen)   Evidence  Model  ConstrucAon  (Manual)   •  The identification of target concepts whose instances we wish to disambiguate (e.g. locations) •  The determination related concepts whose instances may serve as contextual disambiguation evidence. • For example, in texts that describe historical events, some concepts whose instances may act as location evidence are related locations, historical events, and historical groups and persons. •  The identification, for each pair of evidence and target concept, of the relation paths that links them. 69
  70. 70. Jeff  Z.  Pan  (University  of    Aberdeen)   Evidence-­‐Target  Paths   70
  71. 71. Jeff  Z.  Pan  (University  of    Aberdeen)   Term  ExtracAon  (AutomaAc)   Extraction is performed with Knowledge Tagger (from iSOCO) based on GATE. 71
  72. 72. Jeff  Z.  Pan  (University  of    Aberdeen)   EvaluaAon  Results:  Football  Match  Scenario   •  50 texts describing football matches. •  E.g. “It's the 70th minute of the game and after a magnificent pass by Pedro, Messi managed to beat Claudio Bravo. Barcelona now leads 1-0 against Real." 72
  73. 73. Jeff  Z.  Pan  (University  of    Aberdeen)   EvaluaAon  Results:  Military  Conflict  Scenario   •  50 historical texts describing military conflicts. •  E.g. “The Siege of Augusta was a significant battle of the American Revolution. Fought for control of Fort Cornwallis, a British fort near Augusta, the battle was a major victory for the Patriot forces of Lighthorse Harry Lee and a stunning reverse to the British and Loyalist forces in the South”. 73
  74. 74. Jeff  Z.  Pan  (University  of    Aberdeen)   Future  Work       •  Fully automated construction of the disambiguation evidence model. •  Challenge here is how to automatically identify the text’s domain/ topic. •  Combination with statistical methods for cases where available domain semantic information is incomplete. •  Challenge here is how to select the optimal ratio of ontological evidence v.s. statistical one. •  Development of tool to enable users to dynamically build such models out of existing semantic data and use them for disambiguation purposes 74
  75. 75. Jeff  Z.  Pan  (University  of    Aberdeen)   Issues  in  Test-­‐Driven  Ontology  Authoring 1.  How to generate tests 2.  How to judge the quality of tests •  why they are relevant •  how to provide the correct expected answers 75 75
  76. 76. Jeff  Z.  Pan  (University  of    Aberdeen)   Requirement  Driven?   • How  about  starLng  from  requirements  instead   of  tests?     Ontology   Authoring   Requirements   Ontology   Authoring   Tests   Test  Results   76
  77. 77. Jeff  Z.  Pan  (University  of    Aberdeen)   Requirement-­‐Driven  Ontology  Authoring     [Ren  et.  al,  2014] •  Key questions •  RQ1: what forms of requirements should we consider •  RQ2: how to generate authoring tests from requirements 77 77
  78. 78. Jeff  Z.  Pan  (University  of    Aberdeen)   Competency  QuesAon       •  QuesLons  that  people   expect  the  constructed   ontologies  to  answer   •  Useful  for  novice  users       •  in  natural  languages   •  about  domain  knowledge   •  requires  liSle   understanding  of   ontology  technologies   •  A  typical  CQ:  Which  pizza  has  some  cheese  topping?     78
  79. 79. Jeff  Z.  Pan  (University  of    Aberdeen)     RQ1: what forms of requirements should we consider   RQ1’:  How  are  CQs  formulated?   Competency  QuesAons  (CQs)  can  be   regarded  as  a  funcAonal  requirement  of   the  ontology     79
  80. 80. Jeff  Z.  Pan  (University  of    Aberdeen)   Key  Idea  1:  IdenAficaAon  of  CQ  Paoerns   •  A  typical  CQ:  Which  pizza  has  some  cheese  topping?     •  Hypothesis:  CQs  usually  have   clear  syntacLc  paSerns   •  Features  and  elements  can   be  extracted   Feature:  Type  of  quesLon   Element:  Class  expression  CE1   Element:  Object  property   expressions  OPE   Feature:  Binary  predicate   Element:  Class  expression  CE2   CE1   OPE   CE2   80
  81. 81. Jeff  Z.  Pan  (University  of    Aberdeen)   Result  1:  A  Feature-­‐based  Framework  for  CQ   FormulaAon   •  Based  on  CQs  collected  from  the  Soiware  Ontology  Project  (75  CQs)   and  Manchester  OWL  Workshops  (70  CQs)   •  Primary  features  -­‐>  CQ  Archetypes   •  Secondary  features  -­‐>  CQ  Subtypes   Feature   Primary  Feature   Secondary  Feature   QuesLon   Type   Element   Visibility   SelecLon   Boolean   CounLng   Explicit   Implicit   Predicate   Arity   Unary   Binary   N-­‐ary   RelaLon   Type   Object   Datatype   Modifier   QuanLty   Numeric   Domain   Independent   Element   SpaLal   Temporal   QuesLon   Polarity   PosiLve   NegaLve   81
  82. 82. Jeff  Z.  Pan  (University  of    Aberdeen)   Result  2:  Archetypes  of  CQ  Paoerns   82
  83. 83. Jeff  Z.  Pan  (University  of    Aberdeen)   Answerability  of  CQs   •  ExisLng  work  focused  on   answering  CQs  directly   •  But  is  the  answer   meaningful?   •  The  ability  to  answer  CQs   meaningfully  can  be   regarded  as  a  funcLonal   requirement  of  the  ontology   •  What  if  the  answer  is  an   empty  set   •  Possible  scenarios   •  Pizza  does  not  exist   •  Cheese  topping  does  not  exist   •  Pizzas  are  not  allowed  to  have   cheese  topping   •  The  ontology  has  not  been   populated  with  any  cheesy   pizza  yet   •  …   •  A  typical  CQ:  Which  pizza  has  some  cheese  topping?     83
  84. 84. Jeff  Z.  Pan  (University  of    Aberdeen)     RQ2: how to generate authoring tests from requirements RQ2’:  How  can  we  automaLcally  test  whether  a   CQ  can  be  meaningfully  answered?   84
  85. 85. Jeff  Z.  Pan  (University  of    Aberdeen)   Key  Idea  2:  PresupposiAons  of  CQ   •  A  CQ  comes  with  certain   presupposi(ons   •  Some  condi(ons  the  speakers   assume  to  be  met   •  A  CQ  can  be  meaningfully   answered  only  when  its   presupposiLons  are  saLsfied   •  Classes  Pizza,  CheeseTopping   should  occur  in  the  ontology   •  Property  has(Topping)  should   occur  in  the  ontology   •  The  ontology  should  allow   Pizza  to  have  CheeseTopping   •  The  ontology  should  also   allow  Pizza  to  not  have   CheeseTopping   •  A  typical  CQ:  Which  pizza  has  some  cheese  topping?     85
  86. 86. Jeff  Z.  Pan  (University  of    Aberdeen)   CQs  and  Authoring  Tests   •  A  typical  CQ:  Which  pizza  has  some  cheese  topping?     •  SaLsfiability  of  CQ   presupposiLons  can  be   verified  by  authoring  tests   generated  based  on  its   features  and  elements   •  Classes  Pizza,  CheeseTopping   should  occur  in  the  ontology   •  [CE1],  [CE2]  should  both  occur  in  the   class  vocabulary   •  Property  has(Topping)  should  occur   in  the  ontology   •   [OPE]  should  occur  in  the  property   vocabulry   •  The  ontology  should  allow  Pizza  to   have  CheeseTopping   •   should  be  sa6sfiable   •  The  ontology  should  also  allow   Pizza  to  not  have  CheeseTopping •   should  be  sa6sfiable   CE1   OPE   CE2   86
  87. 87. Jeff  Z.  Pan  (University  of    Aberdeen)   Result  3:  Associate  PresupposiAons  with   Features   • Features  in  a  CQ  are  associated  with  the   presupposiLons  of  the  CQ.     •  An  example  on  the  ques6on  type  feature:   QuesLon   Type   SelecLon   Boolean   CounLng   Occurrence  of  “Pizza”,  “Pork”,   “contains”   Which  pizza  contains  pork?   Can  pizza  contain  pork?   How  many  pizza  contains  pork?   Some  pizza  can  contain  pork   Some  pizza  can  contain  no  pork   87
  88. 88. Jeff  Z.  Pan  (University  of    Aberdeen)   Result  4:  Formal  Authoring  Tests   •  All  tesLngs  can  be  automated   88
  89. 89. Jeff  Z.  Pan  (University  of    Aberdeen)   Class   Hierarchy   Verbalise r   Competency   QuesLons   User/System  Dialogue   History   User  Input   WhatIf  Gadget   89
  90. 90. Jeff  Z.  Pan  (University  of    Aberdeen)   Input  (Manchester  Syntax)   1. User  selects  a  speech  act  by  clicking  or  selecLng  a   shortcut.   2. We  need  to  evaluate  their  usefulness.   3. Examples:   ●  Class:  Pizza  SubClassOf:  Food   ●  Class:  Fruit  DisjointWith:  Pizza     90
  91. 91. Jeff  Z.  Pan  (University  of    Aberdeen)   Input  (OWL  Simplified  English)   1.  A  set  of  restricted  natural  language  paSerns.   2.  System  recognises  the  speech  act.   3.  Capable  of  accepLng  Competency  QuesLons.   4.  Examples:   ●  Which  pizza  has  topping  a  tomato  topping?   ●  An  apple  is  a  fruit.   91
  92. 92. Jeff  Z.  Pan  (University  of    Aberdeen)   Modelling  User  Goals  (1)   1. Users  can  import  or  write  their  own  CQs  in  OWL   Simplified  English   2. Based  on  the  inserted  CQ,  a  list  of  Authoring  Tests   (ATs)  will  be  generated.   3. A  tree  structure  displays  these  CQs  and  ATs.   4. The  system  is  constantly  monitoring  these  CQs  and   ATs.  Any  change  in  saLsfiability  of  ATs:   a. Will  be  reported  by  changing  the  icon  of  ATs  in   the  tree.  Red/Green  respecLvely  represent  fail/ pass  of  each  AT.   b. Will  be  reported  in  the  “history”  pane.     92
  93. 93. Jeff  Z.  Pan  (University  of    Aberdeen)   Modelling  User  Goals  (2)   CQ  +  AT  hierarchical  representaLon.   Icons  represent  the  saLsfiability  state   WriSen  feedback  presented  to  the   user  in  the  “history”  pane.   93
  94. 94. Jeff  Z.  Pan  (University  of    Aberdeen)   Further  Challenges   ●  Maintaining a continuous and meaningful interaction with the user ●  Generating a coherent and comprehensive set of entailments in response to What-If questions ❖ Content selection ❖ Grouping and aggregation ❖ Ordering 94
  95. 95. Jeff  Z.  Pan  (University  of    Aberdeen)   •  Data  understanding •  Data summarisation •  Query generation   UNDERSTANDING  KNOWLEDGE  GRAPHS   95
  96. 96. Jeff  Z.  Pan  (University  of    Aberdeen)   Data  Understanding:  A  Core  AcAvity  in  Data  ExploitaAon   •  TradiLonal  focus  in  semanLc  web  research:  data   understanding  for  machines  and  programs.   •  More  importantly:  Data  understanding  for  human   •  humans  are  the  ulLmate  owners  and  consumers  of   data   • systems  such  as  knowledge  graphs,  Watson,  Siri,  etc.   •  to  help  human  users  to  understand  the  contents,   implicaLons  and  applicaLons  of  data   • More  than  HCI,  we  want  interesLng  and  insighqul  data!   9696
  97. 97. Jeff  Z.  Pan  (University  of    Aberdeen)   SemanAc  Datasets  Are  HARD  to  Understand   •  Non-expert users might not be familiar with RDF, OWL and SPARQL •  RDF(s) has 6 core documents •  OWL 2 has 6 core documents •  SPARQL 1.1 has 11 core documents •  Users are unfamiliar with datasets •  That are too large to explore •  That are external to their organisation •  … •  It is HARD for novice users to construct queries 9797
  98. 98. Jeff  Z.  Pan  (University  of    Aberdeen)   Challenges  of  Data  Understanding     • Challenges   •  Expressing  needs  (keywords/SPARQL)   •  Describing  datasets   •  Only  retrieve  the  relevant  parts   • 9.96%  SPARQL  /  8.19%  DUMP  
  99. 99. Jeff  Z.  Pan  (University  of    Aberdeen)   SoluAon  –  Summary  based  profiling  for  LD   •  Key  idea:  building  block  based  informaLon  space   modelling   •  Decomposing  &  ConstrucLng  
  100. 100. Jeff  Z.  Pan  (University  of    Aberdeen)   The  philosophy  of  interpreAng  informaAon     • Task:  explain  the  data  to  human  users         Entity Centric
  101. 101. Jeff  Z.  Pan  (University  of    Aberdeen)   EnAty-­‐centric  View  of  RDF  Data   En6ty  Descrip6on  Block  
  102. 102. Jeff  Z.  Pan  (University  of    Aberdeen)   Concrete  to  abstract     En6ty  Descrip6on  Pa?ern  
  103. 103. Jeff  Z.  Pan  (University  of    Aberdeen)   Data  SummarisaAon  –  EDP  Graph •  Reveal  the  schema  level  informaLon   •  What  concepts  are  there  (nodes)and  how  they  are  related  to  each   other(edges)?   •  Disclose    individual  level  distribuLon   •  StaAsAcs  aSached  to  nodes  and  edges   Jamendo  dataset  
  104. 104. Jeff  Z.  Pan  (University  of    Aberdeen)   Understanding  Data  Redundancy   [Wu  et.  al,  2014]   104
  105. 105. Jeff  Z.  Pan  (University  of    Aberdeen)   Related  Paper  at  JIST2014   • Graph  PaSern  based  RDF  Data  Compression   Jeff  Z.  Pan,  Jose  Manuel  Gomez-­‐Perez,  Yuan  Ren,   Honghan  Wu,  Haofen  Wang  and  Man  Zhu   • (Monday  aiernoon)   105
  106. 106. Jeff  Z.  Pan  (University  of    Aberdeen)   Understanding  How  Data  Can  be  Used   •  Given  a  knowledge  graph,  generate  candidate   insighqul  queries   •  Manual  generaLon/automaLc  generaLon   •  GeneraLon  based  on  schema/actual  data   •  With/without  user  interference   •  Our  aim:  automaLc  generaLon  based  on  data   without  user  interference   •  Most  friendly  to  new,  novice  users   •  Complementary  to  inference  (heavily  based  on   schema)   106106
  107. 107. Jeff  Z.  Pan  (University  of    Aberdeen)   Candidate  Insighpul  Queries   [Pan,  et  al,  2013]   •  Graph  paSerns  are  summarisaLons  that  represent   many  subsets  of  the  RDF  graph     •  PaSern  structure   •  Structured  knowledge,  which  is  difficult  to  express  with   schema   •  Such  as  star,  chain,  tree,  loop   •  Correspondences  between  mulLple  graph   paSerns   •  Strongly  corresponding  paSerns  (large  overlapping)   •  Weakly  corresponding  paSerns  (liSle  overlapping)   •  ExcepLons     107
  108. 108. Jeff  Z.  Pan  (University  of    Aberdeen)   Query  GeneraAon  Framework   •  1.  data  summarisaLon   •  Significantly  decrease  the   search  space  in  rule  mining   •  2.  data  analyLcs   •  First  order  inducLve   learning   •  AssociaLon  rule  mining   •  3.  query  generaLon   •  ExploiLng  the  relaLons   between  queries  and  rules  
  109. 109. Jeff  Z.  Pan  (University  of    Aberdeen)   EvaluaAon   109
  110. 110. Jeff  Z.  Pan  (University  of    Aberdeen)   Another  Example   • Given  university  data  set  in  LUBM,  the   following  two  queries  have  the  same  results   (when  no  reasoning  is  applied)  
  111. 111. Jeff  Z.  Pan  (University  of    Aberdeen)   Summary  and  Future  Work   • Take  home  message   •  Data  summarisaLon  and  data  analyLcs   technologies  not  only  help  people  to  find  answers,   but  also  help  people  asking  quesLons!   • Future  work   •  Integrate  with  applicaLon  scenario  background   knowledge   •  Integrate  with  reasoning   •  Integrate  with  user  preferences  
  112. 112. Jeff  Z.  Pan  (University  of    Aberdeen)   OUTLOOK   Outlook of Knowledge Graph: from application’s point of view 112
  113. 113. Jeff  Z.  Pan  (University  of    Aberdeen)   What knowledge graph still needs: •  “How to…” knowledge in addition to “What is …” knowledge •  Operations associated to the entities Outlook   What knowledge graph is good at: Maintaining factual knowledge in a structural manner and answer queries about them 113
  114. 114. JIST2014  Tutorial  on     ConstrucAng  and  Understanding   Knowledge  Graphs     Thanks  you!          QuesAons?