This PowerPoint helps students to consider the concept of infinity.
Web of data
1. Web of Data
Rajendra Akerkar
Western Norway Research Institute
Sogndal, Norway
2. WWW & Society
Social contacts (social networking p
( g platforms, blogging, ...)
, gg g, )
Economics (buying, selling, advertising, ...)
Administration (eGovernment)
Education (eLearning, Web as information system, ...)
Work life (information gathering and sharing)
Recreation (games, role play, creativity, ...)
R. Akerkar 2
3. Limitations of the Current Web
Too much information with too little structure
and made for human consumption
Content search is very simplistic
future requires better methods
Web content is heterogeneous
in terms of content
in terms of structure
in terms of character encoding
Future requires intelligent information integration
q g g
Humans can derive new (implicit) information from given
pieces of information but on the current Web, we can only deal
with syntax, requires automated reasoning techniques
R. Akerkar 3
4. Data Integration on the Web
Data integration on the Web refers to the
process of combining and aggregating
information resources on the Web so they y
could be collectively useful to us.
Goal
for a given resource (say a person an idea an
(say, person, idea,
event, or a product), we would like to know
everything that has been said about it.
R. Akerkar 4
5. Myself as the resource
resource.
Assume we have already built a “smart”
smart
agent,which will walk around the Web
to find everything about me
me.
R. Akerkar 5
6. To get our smart agent started we feed it
started,
with the URL of my personal home page
http:www.tmrfindia.org/ra.html
Agent downloads this page and tries to
collect information from this page
R. Akerkar 6
7. Web page - a traditional Web document
Our agent is able to understand HTML
language constructs,
<p>, <br>, <href>, <table> and <li>
R. Akerkar 7
8. Web page – non-traditional Web
document
besides the HTML constructs, it actually contains
some “statements”
These statements follow the same simple structure
each one of them represents one aspect of the given
resource
ns0:RajendraAkerkar ns0:name ”Rajendra Akerkar".
ns0:RajendraAkerkar ns0:title ”Professor".
ns0:RajendraAkerkar ns0:author <ns0: x>
<ns0:_x>.
ns0:_x ns0:ISBN "978-1-84265-535-1".
ns0:_x ns0:publisher <http://www.alphasci.com>.
R. Akerkar 8
9. Namespace - a mechanism for abbreviating URIs
ns0 represents a namespace, so that we
know everything, with ns0 as its prefix, is
collected from the same Web page.
ns0:RajendraAkerkar represents a
resource that is described by my Web
page;in thi case, thi resource i me.
i this this is
So,
resource 0 R j d Ak k
reso rce ns0:RajendraAkerkar has a
ns0:name property whose value is
RajendraAkerkar
R. Akerkar 9
10. 2nd statement claims the ns0:title property of resource
ns0:RajendraAkerkar has a value givenby Professor.
3rd statement is unusual.
When specifying the value of ns0:author p p y for resource
p y g property
ns0:RajendraAkerkar, instead of using a simple character
string as its value, it uses another resource, and this resource
is identified by ns0:_x. To make this fact more obvious,
ns0: x i i l d d b <>
0 is included by <>.
4th statement specifies the value of ns0:ISBN property of
resource ns0:_x
the l t t t
th last statement specifies the value of ns0:publisher
t ifi th l f
property of the same resource.
the value of this property is not a character sting, but another
resource identified by htt //
http://www.alphasci.com.
l h i
R. Akerkar 10
11. How much does our agent understand
these statements?
Agent organizes them into a graph
R. Akerkar 11
12. A graph generated by agent after visiting
Web page
”Rajendra ns0:name
Akerkar".
ns0:title
Professor ns0:RajendraAkerkar
8@
akerkar8@gmail.c
om ns0:e-mail
ns0:author
ns0:homepage
Http://www.tmrfindia.org/
ra.html
ns0:ISBN
978-1-84265-535-1 ns0:title
Foundations of the
Semantic Web: XML, ns0:publisher
RDF & Ontologies
http://www.alphasci.com
R. Akerkar 12
13. Agent hits another Web page
www.amazon.com
Existing amazon: agent doesn’t know how to
retrieve information about ISBN number
New amazon: agent can collect statements, such as
ns1:book-1842655353 ns1:ISBN "978-1-84265-535-1".
ns1:book 1842655353
ns1:book-1842655353 ns1:price USD 68.80.
ns1:book-1842655353 ns1:customerReview "4 star".
Similar to namespace prefix ns0,
ns1 represents another names-pace prefix.
1 t th fi
Graph?
R. Akerkar 13
15. Obvious fact for us
ns0: x,as a resource represents exactly the same
p y
item denoted by the resource named
ns1:book-1842655353
Observation:
a person who has a home page with its URL given by
http://www.tmrfindia.org/ra.html h a b k
// i i / has book
published and the latest price of that book is US $68.80 on
Amazon.
Fact i
F t is not explicitly stated on either one of the Websites,
t li itl t t d ith f th W b it
but we have integrated the information to reach this
conclusion.
R. Akerkar 15
16. Agent does data integration
makes a connection between two appearances of
ISBN in two different sets of statements
It will then automatically add the following new
statement to its original statement collection:
ns0:_x sameAs ns1:book-1842655353
This process is exactly the data integration process
on the Web
Graph?
R. Akerkar 16
18. What agent can do?
answer lots of questions that we might have
For example,
example
what is the price of the book written by a person
whose home page is given by URLURL,
http:www.tmrfindia.org/ra.html
R. Akerkar 18
19. Yet another attempt
Let us say now our agent hits
www.linkedIn.com.
If LinkedIn were still the LinkedIn today, our
y,
agent could not do much.
However, assume LinkedIn is a new LinkedIn
and our agent is able to collect quite a few
statements from this Web site.
ns2:RajendraAkerkar ns2:email ”akerkar8@gmail com".
akerkar8@gmail.com
ns2:RajendraAkerkar ns2:companyWebsite "http://www.vestforsk.no".
ns2:RajendraAkerkar ns2:connectedTo <ns2:Jacques>.
Graph?
R. Akerkar 19
20. A graph generated by agent after visiting
linkedIn.com
ns2:Professor ns2:currentJob ns2:RajendraAkerkar
http://www.vestforsk.no ns2:companyWebsite
ns2:address
ns2:email ns2:connectedTo
akerkar8@gmail.com
ns2:Norway ns2:country
ns2:Jacques
R. Akerkar 20
21. We know ns0:RajendraAkerkar and
ns2:RajendraAkerkar represent exactly the same resource,
because both these two resources have the same e-mail address.
For our agent, just by comparing the two identities
(ns0:RajendraAkerkar vs. ns2:RajendraAkerkar)does not
( 0 R j d Ak k 2 R j d Ak k )d t
ensure the fact that these two resources are the same.
However, if we can “teach” our agent the following fact:
If the e-mail property of resource A has th same value as th e-
th il t f h the l the
mail property of resource B, then resources A and B are the same
resource.
Then our agent will be able to automatically add the following new
statement to its current statement collection:
ns0:RajendraAkerkar sameAs ns2:RajendraAkerkar.
R. Akerkar 21
22. With the creation of this new statement our
statement,
agent has in fact integrated graphs by
overlapping nodes
pp g
Now, agent will be able to answer more
questions:
What is Rajendra’s company website?
How much does it cost to buy Rajendra’s book?
Rajendra s
Which country does Rajendra live in?
Agent answers using integrated graph
R. Akerkar 22
23. Automatic data integration
Obviously, the set of questions that our agent is able
to answer grows by hitting more Web documents.
We can continue to move onto another Web site so
as to add more statements to our agent’s collection.
Automatic data integration on the Web can be quite
powerful and can help us a lot when it comes to
information discovery and retrieval.
R. Akerkar 23
24. Smart Data Integration Agent
The Web and the agent
The W b
Th Web – change f
h from it t diti
its traditional form
lf
Each statement collected by our agent represents a piece of knowledge
(a model to represent knowledge on the Web)
Such model of representing knowledge has to be easily and readily
processed (understood) by machines.
This model has to be accepted as a standard by all the Web sites (share
a common pattern).
Way to create such statements (manually or automatically)
The statements contained in different Web sites can not be completely
arbitrary (e.g., to describe a person, we have some common terms such
as name, birthdate, and home page)
Agreement on common terms and relationships
A new breed of Web …!!!
R. Akerkar 24
25. Smart Data Integration Agent
Agent - new agent
Agent has to be able to understand each statement that it collects. By
understanding the common terms and relationships that are used to
create these statements.
Agent has to be able to conduct reasoning based on its understanding
of the common terms and relationships.
For example, knowing the fact that resources A and B have the same e-mail
example e mail
address and considering the knowledge expressed by the common terms
and relationships, it should be able to conclude that A and B are infact the
same resource.
Agent should be able to process some common queries that are
submitted against thestatements it has collected.
Some more to be included ...
R. Akerkar 25
26. The Idea of the Semantic Web
The Semantic Web provides the technologies
and standards that we need to make the
following p
g possible:
adds machine-understandable meanings to the
current Web, so that
computers can understand the Web documents
and therefore can automatically
accomplish tasks that have been otherwise conducted
manually, on a large scale.
R. Akerkar 26
27. Idea of the Semantic Web
The Semantic Web provides the technologies and
standards that we need to make our agent possible
A brand new layer built on top of the current Web,
and it adds machine understandable meanings (or
“semantics”) to the current Web.
“ ti ”) t th tW b
The Semantic Web is certainly more than automatic
data integration on a large scale.
R. Akerkar 27
28. What is the Semantic Web?
The Semantic Web: … content that is meaningful tog
computers [and that] will unleash a revolution of new
possibilities … Properly designed, the Semantic Web can
assist the evolution of human knowledge …”
Tim Berners-Lee, …, Weaving the Web
The semantic Web is supposed to make data located
anywhere on the web accessible and understandable,
both to
b th t people and machines. Thi i more a vision
l d hi This is i i
than a technology.
R. Akerkar 28
29. The Web as visioned by Tim
Tim Berners-Lee has a two-part vision for the
Berners Lee two part
future of the web:
o The first part is to make the web a more
p
collaborative medium.
o The second part is to make the web
understandable and thus processable by
machines.
R. Akerkar 29
30. The Web as visioned by Tim
Tim Berners‐Lee’s original diagram of his vision
R. Akerkar 30
31. The change between current Web and the Semantic Web?
g
Resources:
identified by URI s
URI's
untyped
Links:
href, src, ...
limited, non-descriptive
User:
Exciting world - semantics
of the resource, however,
gleaned from content
Machine:
Very little information
available - significance of
il bl i ifi f
the links only evident from
the context around the
anchor.
Current Web
R. Akerkar 31
32. The change between current Web and the Semantic Web?
Resources:
Globally Identified by URI's
y p (
or Locally scoped (Blank) )
Extensible
Relational
Links:
y
Identified by URI's
Extensible
Relational
User:
g
Even more exciting world, ,
richer user experience
Machine:
More processable
information is available
(Data Web)
Computers and people:
Work, learn and exchange
g
knowledge effectivelyy
Semantic Web
R. Akerkar 32
33. A Layered Approach
y pp
The development of the Semantic Web
proceeds in steps
Each step building a layer on top of another
Principles:
Downward compatibility
U
Upward partial understanding
d ti l d t di
33 Chapter 1
R. Akerkar A Semantic Web Primer 33
34. The Semantic Web in W3C’s view
34 Chapter 1
R. Akerkar A Semantic Web Primer 34
35. An Alternative Layer Stack
y
Takes recent developments into account
The main differences are:
− The ontology layer is instantiated with two alternatives: the
current standard Web ontology language, OWL, and a rule-
based language
− DLP is the intersection of OWL and Horn logic, and serves as a
g
common foundation
The Semantic Web Architecture is currently being
debated and may be subject to refinements and
modifications in the future.
35 Chapter 1
R. Akerkar A Semantic Web Primer 35
37. Semantic Web Layers
XML layer
Syntactic basis
RDF layer
y
RDF basic data model for facts
RDF Schema simple ontology language
Ontology layer
More expressive languages than RDF Schema
Current Web standard: OWL
37 Chapter 1
R. Akerkar A Semantic Web Primer 37
38. Semantic Web Layers (2)
y ( )
Logic layer
enhance ontology languages further
application-specific declarative knowledge
Proof layer
Proof generation, exchange, validation
Trust layer
Digital signatures
recommendations, rating agencies ….
38 Chapter 1
R. Akerkar A Semantic Web Primer 38
39. Semantic Web Challenges
The Web is distributed
many sources, varying authority
inconsistency
The Web is dynamic
representational needs may change
The Web is enormous
systems must scale well
The Web is an open-world
R. Akerkar 39
40. References
R. Akerkar, Foundations of the Semantic Web, Narosa Publishing
House, New Delhi and Alpha Science Intern., London, ISBN-978-81-
7319-985-1.
Berners-Lee T.,Hendler J., Lassila O. (2001) The Semantic Web.
SciAm 284(5):34 43
284(5):34–43
Liyang Yu, A Developer’s Guide to the Semantic Web, Springer, ISBN
978-3-642-15969-5
Pascal Hitzler, Markus Krötzsch, Sebastian Rudolph. Foundations of
Semantic Web Technologies, CRC Press/Chapman and Hall (2009)
http://www.w3.org/2001/sw/SW-FAQ
http://www.w3.org/2001/sw/
R. Akerkar 40