O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Principles for knowledge engineering on the Web

7.233 visualizações

Publicada em

Keynote ICK3 conference, Paris, 2011

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Principles for knowledge engineering on the Web

  1. 1. Principles for knowledge engineering on the Web Guus Schreiber VU University Amsterdam Computer Science, Web & Media
  2. 2. Overview of this talk • Semantic Web: the digital heritage case • Knowledge-engineering principles • Challenges for Web KE
  3. 3. My journey knowledge engineering • design patterns for problem solving • methodology for knowledge systems • models of domain knowledge • ontology engineering
  4. 4. My journey access to digital heritage
  5. 5. My journey Web standards • Web metadata: RDF • OWL Web Ontology Language • SKOS model for publishing vocabularies on the Web
  7. 7. The Web: resources and links URL URL Web link
  8. 8. The Semantic Web: typed resources and links URL URL Web link ULAN Henri Matisse Dublin Core creator Painting “Woman with hat” SFMOMA
  9. 9. Vocabulary interoperability: SKOS
  10. 10. Vocabulary representations • SKOS has been a major success • Easy to understand and create • LCSH publication set important example
  11. 11. The myth of a unified vocabulary • In large virtual collections there are always multiple vocabularies – In multiple languages • Every vocabulary has its own perspective – You can’t just merge them • But you can use vocabularies jointly by defining a limited set of links – “Vocabulary alignment” • It is surprising what you can do with just a few links
  12. 12. Example use of vocabulary alignment “Tokugawa” SVCN period Edo SVCN is local in-house ethnology thesaurus AAT style/period Edo (Japanese period) Tokugawa AAT is Getty’s Art & Architecture Thesaurus
  13. 13. Enriching metadata with concepts
  14. 14. Learning vocabulary alignments • Example: learning relations between art styles and artists through NLP of art historic texts – “Who are Impressionist painters?”
  15. 15. Semantic search: result clustering based on retrieval path
  16. 16. Research issues • Information retrieval as graph search – more semantics => more paths – finding optimal graph patterns • Vocabulary alignment • Information extraction – recognizing people, locations, … – identity resolution • Multi-lingual resources
  17. 17. Personalized Rijksmuseum • Interactive user modeling •Recommendations of artworks and art topics
  18. 18. Mobile museum tour
  20. 20. Principle 1: Be modest! • Ontology engineers should refrain from developing their own idiosyncratic ontologies • Instead, they should make the available rich vocabularies, thesauri and databases available in an interoperable (web) format • Initially, only add the originally intended semantics
  21. 21. Principle 2: Think large! "Once you have a truly massive amount of information integrated as knowledge, then the human-software system will be superhuman, in the same sense that mankind with writing is superhuman compared to mankind before writing." Doug Lenat
  22. 22. Principle 3: Develop and use patterns! • Don’t try to be (too) creative • Ontology engineering should not be an art but a discipline • Patterns play a key role in methodology for ontology engineering • See for example patterns developed by the W3C Semantic Web Best Practices group http://www.w3.org/2001/sw/BestPractices/
  23. 23. Principle 4: Don’t recreate, but enrich and align • Techniques: – Learning ontology relations/mappings – Semantic analysis, e.g. OntoClean – Processing of scope notes in thesauri
  24. 24. Principle 5: Beware of ontological over-commitment!
  25. 25. Principle 6: writing in an ontology language doesn’t make it an ontology! • Ontology is vehicle for sharing • Papers about your own idiosyncratic “university ontology” should be rejected at conferences • The quality of an ontology does not depend on the number of, for example, OWL constructs used
  26. 26. Principle 7: Required level of formal semantics depends on the domain! • In our semantic search we use three OWL constructs: – owl:sameAs, owl:TransitiveProperty, owl:SymmetricProperty • But cultural heritage has is very different from medicine and bioinformatics – Don’t over-generalize on requirements for e.g. OWL
  28. 28. Challenge: Linked Open Data
  29. 29. Availability of government data: http://data.gov.uk
  30. 30. The fight for “standard” semantics Schema.org
  31. 31. Challenge: vocabulary alignment methodology • Multitude of alignment techniques available – Direct syntactic match – Lexical manipulation – Structured, …. • Precision & recall varies • Large evaluation initiative – OAEI http://oaei.ontologymatching.org/
  32. 32. Limitations of categorical thinking • The set theory on which ontology languages are built is inadequate for modelling how people think about categories (Lakoff) – Category boundaries are not hard: cf. art styles – People think of prototypes; some examples are very prototypical, others less • We also need to make meta-distinctions explicit – organizing class: “furniture” – base-level class: “chair” – domain-specific: “Windsor chair”
  33. 33. Challenge: new types of search exploiting semantics
  34. 34. Relation search: Picasso, Matisse & Braque
  35. 35. Challenge: combining professional annotations with public “tags”
  36. 36. Challenge: data trust issues • How can a museum trust annotations of outsiders? • Need to adapt techniques from closed world to open world • Ongoing case studies study reputation assessment, use of probability theories, ….
  37. 37. Challenge: event-centred approach => people like narratives
  38. 38. Extracting piracy events from piracy reports & Web sources
  39. 39. Visualising piracy events
  40. 40. Large-scale experimentation!
  42. 42. We need to study the Web as a phenomenon • Web dynamics • Collective intelligence • Privacy, trust and security • Linked open data • Universal access
  43. 43. Web for Social Development 48
  44. 44. Acknowledgements • Long list of people • Projects: MIA, MultiemdiaN E-Culture, CHOICE, MunCH, CHIP, Agora, PrestoPrime, NoTube, EuropeanaConnect, Poseidon