Want to get an update on Nuxeo's involvement in semantic search and knowledge extraction? Watch this slideshow to hear all the latest news on this topic and learn how it may impact the future of Enterprise Content Management!
If you want to go further, watch the video of a webinar using this slideshow http://www.youtube.com/watch?v=YLgJKx1y6Fk
12. Invented the web in 1989
(yeah!)
Invented the semantic
web in 1994 (duh?)
Wednesday, May 25, 2011
13. Historical perspective
• From web 1.0: web of sites and pages,
aka the World Wide Web
• To web 2.0: web of people and of
participation, aka the Social Web (Blogs,
RSS, tags, Facebook, Wikipedia, etc.)
• To web 3.0: web of data, of meaning and
connected knowledge, aka the Semantic
Web
Wednesday, May 25, 2011
19. Some examples
• FOAF: relationships between people (social
network)
• SIOC: relationships between websites,
articles, blogs, comments
• Rich Snippets: syndicate RDFa content for
SEO by Google,Yahoo
• good-relations: e-commerce (Ebay...)
• rNews: metadata for news agencies (AFP,
Reuters...)
Wednesday, May 25, 2011
20. How is it related to
the Web?
Wednesday, May 25, 2011
21. The traditional Web
• A principle: hypertext
• A protocol: HTTP
• An identification scheme: URNs/URIs
• A language: HTML
Wednesday, May 25, 2011
22. “To a computer, then, the web is a flat,
boring world devoid of meaning”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
Wednesday, May 25, 2011
23. “This is a pity, as in fact documents on the
web describe real objects and imaginary
concepts, and give particular relationships
between them”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
Wednesday, May 25, 2011
24. “Adding semantics to the web involves two things:
allowing documents which have information in
machine-readable forms, and allowing links to be
created with relationship values.”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
Wednesday, May 25, 2011
25. “The Semantic Web is not a separate Web but an
extension of the current one, in which information
is given well-defined meaning, better enabling
computers and people to work in cooperation.”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
Wednesday, May 25, 2011
26. The traditional Web
• A principle: hypertext
• A protocol: HTTP
• An identification scheme: URNs/URIs
• A language: HTML
Wednesday, May 25, 2011
27. The semantic Web
• A principle: hypertext
• A protocol: HTTP
• An identification scheme: URNs/URIs
• A language: HTML RDF
Wednesday, May 25, 2011
29. The W3C “Layer Cake”
Already
standardized
Wednesday, May 25, 2011
30. URIs and the
Web of Things
• URIs (Unique Resource Identifiers) are
used to identify things (also called
entities) in the real world
• For instance: people, places, events,
companies, products, movies, etc.
Wednesday, May 25, 2011
31. The RDF model
RDF is used to describe relationships
between objects, identified by their URIs
Predicate
Subject Object
Wednesday, May 25, 2011
32. Example
Source: http://www.slideshare.net/AntidotNet/web-smantique-web-de-donnes-
web-30-linked-data-quelques-repres-pour-sy-retrouver
Wednesday, May 25, 2011
34. SPARQL
• Query language for RDF databases
• Several implementations
• OSS: Apache Jena, Sesame, 4Store,
Virtuoso, Mulgara, Redland, Open Anzo...
• Proprietary: 5Store, AllegroGraph
RDFStore, Stardog, Dydra, OWLIM...
• More expressive than SQL, scalability is still
an open question
Wednesday, May 25, 2011
36. Where and how
to find these data?
Wednesday, May 25, 2011
37. Solution 1: “Lift”
• One can use HTML scrapping and natural
language processing (NLP) technique to
extract semantic information from existing
content / sites
• Generic solutions: OpenCalais, Zemanta,
Apache Stanbol
• Pro: no need to change existing content
• Con: error prone, needs human checks
Wednesday, May 25, 2011
39. Solution 2: export
• RDFa and microformats are used to embed
semantic information (expressed using the
RDF model) into regular web pages
• RDFa does it using existing (rel) and
additional (about, property, typeof)
attributes
• Microformats only use usual HTML
attributes (class)
Wednesday, May 25, 2011
40. Solution 3: reuse
• Linked Open Data: (usually large) data
repositories available on the web (for free
or not), expressed using the RDF model
• Interoperability between these repositories
(their ontologies) must be defined
Wednesday, May 25, 2011
41. Linked Open Data in 2007
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Wednesday, May 25, 2011
42. 2008
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Wednesday, May 25, 2011
43. 2009
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Wednesday, May 25, 2011
44. 2010
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Wednesday, May 25, 2011
45. Good for Enterprise apps too!
Diagram source: http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/
Wednesday, May 25, 2011
47. Key Enablers
Open Data and Linked Online Data
Advances in automatic content analysis
(linguistics, image processing) and machine
learning
Classical logic and classical AI
Computing power (Moore’s law +
MapReduce)
Wednesday, May 25, 2011
48. The technologies and data
are available,
Let’s put them to use!
Wednesday, May 25, 2011
49. 2. Nuxeo &
Semantic ECM
Wednesday, May 25, 2011
51. Nuxeo: an open source
ECM vendor
Our Focus is Enterprise Content Management
ECM as a Platform for Content Applications
Open Source as Efficient Development Model
Modern architecture for 21st Century business
“Lean, mobile, social, interoperable”
A Social Marketplace in action
Innovation driven by community of customers, partners,
and our core developers
Wednesday, May 25, 2011
52. Nuxeo ECM - From Platform to Products
Construction Media Government Life Sciences
Business
Solutions
Correspondence Contracts Records
Invoice Processing
Management Management Management
Case Structured
Horizontal Document Digital Asset Content
Management Document
Packages Management Management
Framework Server
Aggregator
Nuxeo Enterprise Platform
Complete set of components covering all aspects of ECM
Platform
Content
Infrastructure
Nuxeo Core
Lightweight, scalable, embeddable content repository
49
Wednesday, May 25, 2011
55. Goals for Semantic ECM
Repurpose existing content
Improve search and collaboration
Make information contextual
Extract and use information from your content
Make your content smarter!
Wednesday, May 25, 2011
61. Business value
from semantic ECM
Efficiency gains: 20% to 90% (ex: in search,
collaboration)
Effectiveness gains: better returns from
your assets (ex: news and images from AFP)
Strategic edge: growth, value capture, new
services, gain unfair strategic advantage (ex:
vertical ontologies for CEVAs / CCAs)
Wednesday, May 25, 2011
64. IKS project
• European project under the
FP7, with 13 partners (6 SMEs) and a 8.5 MEUR
budget
• Goal: create a semantic software “stack” that will be
used by CMS vendors to add semantic features to
their products
• Started in Jan. 2009, will last until Dec. 2012
• First tangible result: Apache Stanbol, already
integrated in a Nuxeo plugin
58
Wednesday, May 25, 2011
66. Stanbol: a semantic engine
• From unstructured content to Knowledge
• Language guessing
• Topic classification (Business, Sports,
Media, ...)
• Named Entities extraction and linking
• Relationships and properties extraction
• Pluggable with proprietary engines (ex: Temis)
Wednesday, May 25, 2011
74. Notes
• Nuxeo EP 5.4.2 (next week) will have
significant improvements to enable new
features of the semantic plugins
• Source code here: http://hg.nuxeo.org/
addons/nuxeo-platform-semantic-entities/
• Join us at the IKS Paris Workshop on July
5-6 to learn much more about Nuxeo and
semantic technologies!
Wednesday, May 25, 2011