SlideShare uma empresa Scribd logo
1 de 121
Nick Bassiliades,
Aristotle University of Thessaloniki
Enhancing seCurity and privAcy in the Social wEb: a user-centered
approach for the protection of minors
Funded by the Horizon H2020 Framework Programme of the European Union under grant agreement no 691025.
Semantic Technologies for the Web of Linked Data
ENCASE Summer School
Limassol, 17th July 2017
H2020 – Grant Agreement no. 691025
• Nick Bassiliades (Νικόλαος Βασιλειάδης)
‒ http://intelligence.csd.auth.gr/people/bassiliades
• Associate Professor, Department of Informatics, Aristotle University of
Thessaloniki, Greece
• Scientific specialization: Knowledge Systems
‒ Knowledge Representation & Reasoning (Rule-based systems, Logic
programming, Defeasible Reasoning, Knowledge-based systems / expert
systems)
‒ Semantic Web (Ontologies, Linked Open Data, Semantic Web Services)
‒ Multi-agent systems
‒ Intelligent Applications on e-Learning, e-Government, e-Commerce, Electric
Vehicles
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 2
A few words about the speaker
H2020 – Grant Agreement no. 691025
• Aristotle University of Thessaloniki, Greece
‒ Largest University in Greece and South-East Europe
‒ Since 1925, 41 Departments, ~2K faculty, ~45K students
• Dept. of Informatics
‒ Since 1992, 28 faculty, 5 research labs, ~1100 undergraduate students, ~200 MSc
students, ~80 PhD students, ~120 PhD graduates, >3500 pubs
• Software Engineering, Web and Intelligent Systems Lab
‒ 7 faculty, 20 PhD students, 9 Post-doctorate affiliates
• Intelligent Systems group (http://intelligence.csd.auth.gr)
‒ 4 faculty, 7 PhD students, 17 PhD graduates
‒ Research on Artificial Intelligence, Machine Learning / Data Mining, Knowledge
Representation & Reasoning / Semantic Web, Planning, Multi-Agent Systems
‒ 430 publications, 35 projects
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 3
A few words about my institution
H2020 – Grant Agreement no. 691025
• Semantic Web & Linked Open Data
‒ Introductory Concepts
‒ RDF, RDF Schema, Ontologies, OWL, Reasoning
• Storing and Querying RDF
‒ SPARQL, DBpedia
• Ontologies, Logic & Rules
‒ Horn Logic, OWL 2 RL, SWRL, SPIN
• Use Case: (Semantic) Entity Identification / Linking
‒ URank application (Prolog)
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 4
A few words about the talk
H2020 – Grant Agreement no. 691025
Semantic Web &
Linked Open Data
RDF, RDF Schema & OWL
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 5
H2020 – Grant Agreement no. 691025
• A shift from today’s Web, from publishing data in human readable
HTML documents to machine readable documents.
• Today, much of the data we get from the web is delivered to us in the
form of web pages
‒ HTML documents that are linked to each other through the use of hyperlinks
• Humans or machines can read (and browse/crawl) these documents
• Machines can seek keywords in a page
• Machines have difficulty extracting any meaning from these
documents themselves
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 6
The Semantic Web
H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 7
Web evolution
https://atomate.net/blog/web-technologies-now-and-tomorrow/
H2020 – Grant Agreement no. 691025
• An extension of the Web through standard data formats and
exchange protocols
‒ Most important: RDF, OWL, SPARQL, RIF, SWRL, SPIN, …
• SW provides a common framework that allows data to be shared and
reused across application, enterprise, and community boundaries in
the web.
“… a web of data that can be processed by machines”.
Tim Berners-Lee
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 8
The Semantic Web
H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 9
From the Web of Documents to the Web of Data
https://atomate.net/blog/web-technologies-now-and-tomorrow/
H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 10
Web of (Linked) Data
Current Web Semantic Web
Web of Documents Web of Data
Hypertext Documents (HTML) Semi-structured Data representation in graphs
(RDF)
Interconnected documents through
URL links
Linked data through URIs
Human Consumption Machine (and human) Consumption
Use search engines and browsing to
explore
Use search engines, web (RDF) databases to query
and URI links between datasets to explore
Don't just link the documents, link the things
H2020 – Grant Agreement no. 691025
• Some data should be freely available to everyone to use and
republish, without restrictions from copyright, patents or other
mechanisms of control.
‒ Similar to open source, hardware, content and access.
‒ Benefits: Transparency and democratic control, Improved or new private
products and services, Improved government services, New knowledge from
combined data sources and patterns in large data volumes
• Open data must be available in a convenient and modifiable form.
• Interoperability: the ability to combine different datasets together
and to develop more and better products and services
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 11
Open Data
H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 12
XLS
DOC
PDF
H2020 – Grant Agreement no. 691025
• But, how exactly should open data be published (technological side)?
‒ In order to be easily re-used by other people and be interoperable?
• Tim Berners-Lee, the inventor of the Web and Linked Data initiator,
suggested a 5-star deployment scheme for Open Data.
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 13
Open Data Formats & Levels
H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 14
5 ★ OPEN DATA
H2020 – Grant Agreement no. 691025
• 3 ★ OPEN DATA (CSV):
‒ The data is available via the Web; everyone can use the data easily, with no
proprietary software
‒ But, it’s still data on the Web and not data in the Web.
• 4 ★ OPEN DATA (RDF):
‒ Data items have a URI and can be shared / bookmarked on the Web.
‒ You can reuse parts of the data.
‒ Data can be stored in RDF databases (triple stores) and can be queried via public
endpoints on the Web through remote clients (SPARQL protocol)
‒ You can combine the data safely with other data. URIs are a global scheme.
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 15
Going up the ladder
H2020 – Grant Agreement no. 691025
• 5 ★ OPEN DATA (LOD):
‒ Data in the Web (RDF), linked to other data in the Web (URI links via owl:sameAs).
‒ Both the consumer and the publisher benefit from the network effect.
Publisher adds value to data by linking them to other data with more information.
Consumer can discover more (related) data while consuming the data.
Consumer can directly learn about the data schema.
Publisher needs to invest resources to link data to other data in the Web.
Publisher may need to repair broken or incorrect links.
Consumer trusts the consumed data, but what about the data from external links?
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 16
Linked Open Data
H2020 – Grant Agreement no. 691025
• Use URIs as names for “things”, so that names are globally unique (IDs)
‒ Real inanimate or animate things, abstract concepts, …
• Use HTTP URIs, so that “things” can be looked up
‒ E.g. using a browser
• When someone looks up a URI, provide useful information, using the
standards (RDF, SPARQL)
• Include links to other URIs so that more things can be discovered
‒ Links are actually RDF properties interpreted as hyperlinks
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 17
Linked Data Principles
H2020 – Grant Agreement no. 691025
• URIs – A mechanism to identify things (IDs)
• HTTP – A mechanism to access things
• RDF – A mechanism to describe things and their relationships
• RDFS/OWL – A mechanism to describe vocabularies of properties and
relationships of things
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 18
Linked Data Technology Stack
H2020 – Grant Agreement no. 691025
• A data model (framework) for representing information (describing
resources) in the (Semantic) Web
• Compared to other data models, RDF is based on graphs
‒ Graphs have nodes and edges
In RDF graphs:
• Nodes are resources:
‒ Webized entities, i.e. “things” / objects that we want to talk about in the web
‒ Literals, i.e. constant atomic values of various data types
• Edges are properties (also called predicates):
‒ Relationships between entities
‒ Attributes of an entity, linking it with attribute values
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 19
Resource Description Framework (RDF)
H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 20
RDF graph example
http://lpis.csd.auth.gr http://www.csd.auth.gr/bassiliades
http://acm.org/SemanticWeb97913
has-admin
has-phone works-on
…
…
…
H2020 – Grant Agreement no. 691025
• The RDF graph consists of many simple node  edge  node parts
‒ Called “triples”
• The triple can be “read” as a natural language “statement”
‒ “Nick” “has phone” “97913”
• The three parts of the triple have different names
‒ Syntactical terms
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 21
Triples / Statements
http://.../bassiliades 97913
has-phone
H2020 – Grant Agreement no. 691025
• Tabular data: (SQL Databases, Excel, CSV, etc.)
‒ Information arranged in a strict grid.
‒ Adding / removing data is easy
‒ Changing the shape of table is a much higher cost.
• Tree data: (JSON, XML)
‒ Loose structure / can represent easier semi- and un- structured information
‒ Tricky to modify the structure and merge data from multiple sources, especially if
those sources were not designed with the merger in mind.
• Graph data: (RDF)
‒ A set of relationships (triples) between things – it can be any shape - more flexible.
‒ Merging 2 RDF documents is trivial. (Union of two sets!)
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 22
Why graphs?
H2020 – Grant Agreement no. 691025
• RDF uses globally unique identifiers (URIs) for everything
‒ Things we're talking about (resources)
‒ Relationships (properties, predicates)
‒ Datatypes (literals)
• RDF is a list of unambiguous relationships
‒ Merging / combination of two graphs is trivial
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 23
Global Identifiers
H2020 – Grant Agreement no. 691025
• URI: Universal Resource Indicator - identifies something uniquely.
‒ A URI represents a single concept or thing
‒ Many URIs can represent the same thing
‒ If you resolve a URI it's considered good practice to return some useful triples about
the concept the URI represents (optional)
• URL: Universal Resource Location - not only identifies something, but also
describes where it is located.
• All URLs are URIs. Not all URIs are URLs.
• Example
‒ <http://dbpedia.org/resource/Julius_Caesar>: URI for Julius Caesar.
‒ <http://en.wikipedia.org/wiki/Julius_Caesar>: URL for a web page about Julius Caesar
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 24
URI vs URL
H2020 – Grant Agreement no. 691025
• There are several ways of writing RDF triples into a file.
• RDF+XML
• Turtle (and N3)
‒ N-triples
• RDFa
‒ Embed triples into an HTML document
‒ Triples can be extracted from the web page by tools (e.g. pyRdfa)
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 25
RDF Documents
H2020 – Grant Agreement no. 691025
• For URIs it's common to define related concepts in the same namespace
• A namespace is like a directory on a filesystem
‒ All the contained “things” share the same “global” address
‒ The “full address” ends with either "/" and "#"
‒ The local names (IDs) of “things” don't have "/" or "#“
‒ The global names (IDs) of “things” are constructed by namespace:local_name
• In RDF it's common to use a namespace prefix to make things readable
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 26
Namespaces
H2020 – Grant Agreement no. 691025
• In usual graphs (Graph Theory), all nodes and edges are of the same type
(e.g. cities and roads)
• In the RDF model, nodes can belong to different types and all edges are
labelled, so they have different meaning
• Semantic Networks
(semantic nets) (known from AI)
• How do we know which types
of nodes we can use?
‒ What’s in a type?
• What edges' labels (properties) can we use?
‒ Are they related to node types?
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 27
Semantic Networks
Web page
Researcher
Research
Topic
integer
H2020 – Grant Agreement no. 691025
• The term ontology originates from philosophy
‒ The study of the nature of existence
• Different meaning from computer science
‒ An ontology is an explicit and formal specification of a conceptualization
• An ontology is an artifact (1-2-3 ontologies)
• An ontology in the Semantic Web is a formally defined vocabulary that
dictates what node types are there and what properties relate which node
types
‒ And other things, as well
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 28
Ontologies
H2020 – Grant Agreement no. 691025
• Terms denote important concepts
(classes of objects) of the domain
‒ e.g. professors, staff, students, courses,
departments
• Relationships between these terms:
typically class hierarchies
‒ a class C is a subclass of another class B
if every object in C is also included in B
‒ e.g. all professors are staff members
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 29
Typical Components of Ontologies
H2020 – Grant Agreement no. 691025
• Properties / attributes:
‒ X’s phone number is 98713
• (direct) Relationships
‒ X teaches Y
• Relationship restrictions
‒ Faculty members can teach courses
• Disjointness statements
‒ faculty and general staff are disjoint
• Other restrictions on relationships between objects
‒ every department must include at least 10 faculty members
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 30
Further Components of Ontologies
H2020 – Grant Agreement no. 691025
• Ontologies provide a shared understanding of a domain:
‒ semantic interoperability
‒ overcome differences in terminology
‒ mappings between ontologies
• Ontologies are useful for the organization and navigation of Web sites
• Ontologies are useful for improving the accuracy of Web searches
‒ Search engines can look for pages that refer to a precise concept in an ontology
‒ If a query fails to find relevant documents, the search engine may suggest a more
general query
‒ If too many answers are retrieved, the search engine may suggest specializations
The Role of Ontologies on the Web
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 31
H2020 – Grant Agreement no. 691025
• The most important feature of ontologies is the reasoning support
‒ Reasoning: applying knowledge to arrive at solutions
‒ Inferencing: deriving a conclusion based on statements that only imply that
conclusion
• Reasoning/inferencing support is important for:
‒ Checking consistency of ontology and knowledge
‒ Checking for unintended relationships between classes
‒ Automatically classifying instances in classes
‒ Deriving information/knowledge not known before
Ontologies and Reasoning
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 32
H2020 – Grant Agreement no. 691025
• RDF is a generic data model for describing objects, their properties and
relations between them
• RDF Schema is a simple vocabulary (ontology) description language
• In RDF Schema, we can define:
‒ “Allowed” Classes (node types) and Properties (edges’ labels)
‒ Which properties go with which classes
‒ Which values a property can take
‒ Class Hierarchies and Inheritance
‒ Property Hierarchies!
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 33
Semantic Web Ontology Languages
RDF Schema
H2020 – Grant Agreement no. 691025
• A richer ontology language
• (More) relations between classes and properties
‒ e.g., equivalent classes/properties, disjoint classes/properties,
inverse properties, Boolean combinations of classes
• Cardinality constraints
‒ e.g. People have exactly one mother, Students attend at most five courses
• Richer typing / restrictions on properties
‒ E.g. Postgraduate students attend only Postgraduate courses
‒ E.g. Faculty members must teach at least one Undergraduate course
• Characteristics of properties
‒ Object properties vs. Datatype properties
‒ Symmetrical, transitive, functional properties
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 34
Semantic Web Ontology Languages
OWL (Web Ontology Language)
H2020 – Grant Agreement no. 691025
• RDF statement:
ex:index.html dc:creator exstaff:85740
35
RDF Schema example
ex:index.html exstaff:85740
dc:creator
ex:WebPage
rdf:type
ex:Employee
rdf:type
ex:Person
rdfs: subClassOf
rdfs:Class
rdf:type rdf:type
rdf:type
rdf:Property
rdfs:domain rdfs:range
rdf:type
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
H2020 – Grant Agreement no. 691025
ex:WebPage rdf:type rdfs:Class
ex:Employee rdf:type rdfs:Class
ex:Person rdf:type rdfs:Class
ex:Employee rdf:subClassOf ex:Person
dc:creator rdf:type rdf:Property
dc:creator rdfs:domain ex:WebPage
dc:creator rdfs:range ex:Person
36
RDF Schema definitions
• Resource ex:WebPage is a class
‒ An RDFS class, there also OWL classes
• Resource ex:Employee is a class
• Class ex:Employee is a subclass of
class ex:Person
• Resource dc:creator is a property
(instance of rdf:Property class)
• Property dc:creator is attached to
class ex:WebPage and takes values
instances of class ex:Person
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
H2020 – Grant Agreement no. 691025
• Resource ex:index.html is an instance of class ex:WebPage
ex:index.html rdf:type ex: WebPage
• Resource exstaff:85740 is an instance of class ex:Employee
exstaff:85740 rdf:type ex:Employee
37
Connecting RDF instances to classes
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
H2020 – Grant Agreement no. 691025
• Reasoning in RDF Schema is based on entailment rules
• Rules are logical implications:
IF a condition is true THEN the conclusion is also true
• Entailment rules:
IF such and such triples exist THEN add also these triples
(conclusions made true)
• E.g. rule for the subClassOf property:
IF ?x rdf:type ?u . AND ?u rdfs:subClassOf ?v .
THEN ?x rdf:type ?v .
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 38
Reasoning in RDF Schema
H2020 – Grant Agreement no. 691025
RDF Schema Inference / Query Process
Original
(explicit) set
of triples
RDF
document
translation
RDF/S
Inference rules
Inferred
(implicit) set
of triples
Set of all triples
Query
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 39
H2020 – Grant Agreement no. 691025
Reasoning Example in RDF Schema
Due to subClassOf
ex:index.html exstaff:85740
dc:creator
ex:WebPage
rdf:type
ex:Employee
rdf:type
ex:Person
rdfs: subClassOf
rdfs:Class
rdf:type rdf:type
rdf:type
rdf:Property
rdfs:domain rdfs:range rdf:type
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 40
H2020 – Grant Agreement no. 691025
Reasoning Example in RDF Schema
Due to property domain
ex:index.html exstaff:85740
dc:creator
ex:WebPage
rdf:type
ex:Employee
rdf:type
ex:Person
rdfs: subClassOf
rdfs:Class
rdf:type rdf:type
rdf:type
rdf:Property
rdfs:domain rdfs:range
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 41
H2020 – Grant Agreement no. 691025
Reasoning Example in RDF Schema
Due to property range
ex:index.html exstaff:85740
dc:creator
ex:WebPage
rdf:type
ex:Employee
rdf:type
ex:Person
rdfs: subClassOf
rdfs:Class
rdf:type rdf:type
rdf:type
rdf:Property
rdfs:domain rdfs:range
rdf:type
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 42
H2020 – Grant Agreement no. 691025
Reasoning Example in RDF Schema
Due to property range
ex:index.html exstaff:85740
dc:creator
ex:WebPage
rdf:type
ex:Employee
rdf:type
ex:Person
rdfs: subClassOf
rdfs:Class
rdf:type rdf:type
rdf:type
rdf:Property
rdfs:domain rdfs:range
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 43
H2020 – Grant Agreement no. 691025
Reasoning Example in RDF Schema
Due to property range
ex:index.html exstaff:85740
dc:creator
ex:WebPage
rdf:type
ex:Employee
rdf:type
ex:Person
rdfs: subClassOf
rdfs:Class
rdf:type rdf:type
rdf:type
rdf:Property
rdfs:domain rdfs:range
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 44
H2020 – Grant Agreement no. 691025
Reasoning Example in RDF Schema
Due to property range
ex:index.html exstaff:85740
dc:creator
ex:WebPage
rdf:type
ex:Employee
rdf:type
ex:Person
rdfs: subClassOf
rdfs:Class
rdf:type rdf:type
rdf:type
rdf:Property
rdfs:domain rdfs:range
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 45
H2020 – Grant Agreement no. 691025
• OWL is mapped on a Description Logic
‒ A subset of Predicate Logic (aka First-Order Logic / FOL)
‒ Then Description Logic reasoners are used (FaCT++, RACER, Pellet, Hermit,
etc.)
• Why a subset of Predicate Logic?
‒ Reasoning in Predicate Logic is undecidable
‒ Reasoning in Description logics is (usually) decidable
‒ Efficient decision procedures have been designed and implemented
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 46
Reasoning in OWL
H2020 – Grant Agreement no. 691025
• Equivalence of classes
‒ If class A is equivalent to class B, and class B is equivalent to class C, then A is
equivalent to C, too
‒ If class A is subclass of B, B subclass of C and C subclass of A, then A, B, C are
equivalent to each other
• Class membership
‒ If x is an instance of a class C, and C is a subclass of D, then we can infer that x
is an instance of D
‒ If C and D are equivalent classes, then if x is an instance C, then it is also an
instance of D, and vice versa
Reasoning Tasks in OWL
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 47
H2020 – Grant Agreement no. 691025
• Consistency
‒ X instance of both classes A and B, but A and B are disjoint
‒ X is an instance of both A and complement of A
‒ This is an indication of an error in the ontology
• Instance Classification
‒ Certain property-value pairs are a sufficient condition for membership in a
class A
→FirstYearCourses are Courses with Year=1
‒ If an individual x satisfies such conditions, we can conclude that x must be an
instance of A
Reasoning Tasks in OWL
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 48
H2020 – Grant Agreement no. 691025
• Class Classification
‒ A driver is a person that drives a vehicle.
‒ A bus driver is a person that drives a bus.
‒ A bus is a vehicle.
‒ A bus driver drives a vehicle, …
‒ … so he/she must be a driver.
• Instance equality
‒ Every person has a unique mother
‒ X has two mothers A and B
‒ Thus, (in order to restore consistency) A = B.
‒ If we already know that A  B, then we have an inconsistency.
Reasoning Tasks in OWL
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 49
H2020 – Grant Agreement no. 691025
• The more expressive a logic is, the more computationally expensive it
becomes to draw conclusions
‒ Drawing certain conclusions may become impossible if non-computability
barriers are encountered
• Compromise:
‒ A language supported by reasonably efficient reasoners
‒ A language that can express large classes of ontologies and knowledge
Tradeoff between Expressive Power and Computational
Complexity
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 50
H2020 – Grant Agreement no. 691025
• OWL (OWL 2) comes with flavours / profiles of decreasing complexity,
suitable for different types of applications
‒ OWL Full (full compatibility with RDF Schema)
‒ OWL DL (all constructs allowed – not combined with RDF Schema)
‒ OWL 2 EL: useful in applications where ontologies contain very large numbers
of properties / classes, but not so many instances
‒ OWL 2 QL: aimed at applications that with very large volumes of instances;
query answering is the most important reasoning task (data resides in
relational databases)
‒ OWL 2 RL: aimed at applications that require scalable reasoning without
sacrificing too much expressive power (data resides in triplestores)
OWL flavours / profiles
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 51
H2020 – Grant Agreement no. 691025
Storing and Querying RDF data
with SPARQL
… and an introduction to DBpedia
52N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
H2020 – Grant Agreement no. 691025
• RDF data (triples) are stored in special NoSQL databases,
called Triple stores or Graph stores.
‒ Databases optimized to import, store, and query a huge number of triples.
‒ Sesame (70 million), OpenLink Virtuoso (>15.4 billion), GraphDB (>12 b), Apache
Jena TDB (200 m), AllegroGraph (1 trillion), IBM DB2, Oracle, …
• Triple stores are queried using SPARQL
‒ Sending SPARQL queries using the SPARQL protocol
‒ Triple stores provide a (public) endpoint, where SPARQL queries can be submitted
• Clients can send queries to an endpoint using the HTTP protocol.
‒ You can issue a SPARQL query to an endpoint by entering it into the browser
‒ It’s preferable to have a client designed specifically for SPARQL.
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 53
Triple Stores
H2020 – Grant Agreement no. 691025
DBpedia SPARQL endpoint (http://dbpedia.org/sparql)
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 54
H2020 – Grant Agreement no. 691025
• A project aiming to extract structured content from the information
created as part of the Wikipedia.
‒ This structured information is then made available on the Web
• DBpedia allows users to query relationships and properties associated
with Wikipedia resources, including links to other related datasets.
• One of the more famous parts of the Linked Data project, according
to TBL
DBpedia (http://dbpedia.org)
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 55
H2020 – Grant Agreement no. 691025
• Wikipedia articles include structured information embedded in the
articles (mostly free text)
‒ E.g. "infobox" tables, categorization information, images, geo-coordinates and
links to external pages.
‒ This structured information is extracted and put in a uniform dataset which can
be queried.
• The DBpedia project uses RDF to represent the extracted information.
‒ 8.8 billion RDF triples
→1.1 billion from the English edition
→4.4 billion from other language editions (125)
→3.2 billion from DBpedia Commons and Wikidata
DBpedia dataset
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 56
H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 57
Wikipedia extraction to DBpedia
Infobox
Title
H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 58
Wikipedia extraction to DBpedia
Infobox
Categories
H2020 – Grant Agreement no. 691025
• The same concepts can be expressed using different properties in
templates
‒ E.g. birthplace and placeofbirth
‒ Queries about where people were born must search for both properties to get
complete results.
• The DBpedia Mapping Language helps mapping properties to an ontology
‒ Reduces the number of synonyms
• The development of the ontology and the mappings are open to public
‒ Due to large diversity of infoboxes and properties
DBpedia challenges
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 59
H2020 – Grant Agreement no. 691025
SPARQL examples
• Find all cities in Cyprus with a University
select DISTINCT ?c where {
?u rdf:type dbo:University .
?u dbo:country dbr:Cyprus .
?u dbo:city ?c .
} ORDER BY ?c
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 60
H2020 – Grant Agreement no. 691025
SPARQL examples
• Find cities in Cyprus without Universities
select ?c where {
?c rdf:type dbo:City .
?c dbo:country dbr:Cyprus .
FILTER NOT EXISTS {
?u rdf:type dbo:University .
?u dbo:city ?c .
}
}
Empty
result!
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 61
H2020 – Grant Agreement no. 691025
• Find all cities in Cyprus
select ?c where {
?c rdf:type dbo:City .
?c dbo:country dbr:Cyprus .
FILTER NOT EXISTS {
?u rdf:type dbo:University .
?u dbo:city ?c .
}
}
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 62
SPARQL examples
What happened to
the rest of the cities?
H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 63
DBpedia entry for Limassol
There is no dbr:Limassol rdf:type dbo:City . triple in DBpedia
H2020 – Grant Agreement no. 691025
DBpedia: the Truth!
• DBpedia has a lot of wrong or missing information
• Not all Wikipedia properties (infobox) have been correctly
mapped to the corresponding DBpedia ontology property
• Thus, in order to retrieve the correct information sometimes
you have to become a detective!
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 64
H2020 – Grant Agreement no. 691025
Query that returns more results
select DISTINCT ?c where {
{ ?c rdf:type dbo:City . }
UNION
{ ?x dbo:city ?c . }
?c dbo:country dbr:Cyprus .
} order by ?c
?c is a city, or
?c is the city of something ?x
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 65
H2020 – Grant Agreement no. 691025
Filtering results
select DISTINCT ?c where {
{ ?c rdf:type dbo:City . }
UNION
{ ?x dbo:city ?c . }
?c dbo:country dbr:Cyprus .
?c dbo:populationTotal ?p .
FILTER ( ?p >= 50000 )
} order by ?c
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 66
H2020 – Grant Agreement no. 691025
Return cities and their mayors
select DISTINCT ?c ?n where {
{ ?c rdf:type dbo:City . }
UNION
{ ?x dbo:city ?c . }
?c dbo:country dbr:Cyprus .
?c dbo:leaderName ?n .
} order by ?c
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 67
What happened to
the rest of the cities?
They have not their mayor
mentioned in DBpedia
H2020 – Grant Agreement no. 691025
Return cities and their mayors (if mentioned)
select DISTINCT ?c ?n where {
{ ?c rdf:type dbo:City . }
UNION
{ ?x dbo:city ?c . }
?c dbo:country dbr:Cyprus .
OPTIONAL { ?c dbo:leaderName ?n . }
} order by ?c
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 68
The OPTIONAL keyword allows
for flexibility in pattern matching
• Needed for “null values”
H2020 – Grant Agreement no. 691025
What about smaller towns?
select DISTINCT ?c where {
{ ?c rdf:type dbo:Settlement . }
UNION
{ ?x dbo:city ?c . }
?c dbo:country dbr:Cyprus .
?c dbo:populationTotal ?p .
FILTER ( ?p > 1000 )
} order by ?c
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 69
dbo:Settlement is
appropriate for villages,
towns, even neighborhoods
Too many results!
H2020 – Grant Agreement no. 691025
Just count them!
select count(DISTINCT ?c) as ?Num
where {
{ ?c rdf:type dbo:Settlement . }
UNION
{ ?x dbo:city ?c . }
?c dbo:country dbr:Cyprus .
?c dbo:populationTotal ?p .
FILTER ( ?p > 1000 )
} order by ?c
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 70
The results contain also cities,
towns, …, from the pseudo-
state of “North Cyprus”
H2020 – Grant Agreement no. 691025
Exclude results that are (directly) part of
select count(DISTINCT ?c) as ?Num
where {
{ ?c rdf:type dbo:Settlement . }
UNION
{ ?x dbo:city ?c . }
?c dbo:country dbr:Cyprus .
FILTER NOT EXISTS {
?c dbo:isPartOf dbr:Northern_Cyprus .
}
?c dbo:populationTotal ?p .
FILTER ( ?p > 1000 )
}
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 71
The results still contain places
that are indirectly related to
the pseudo-state of “Northern
Cyprus”
(e.g. dbr:Yeni_Jami,_Nicosia)
H2020 – Grant Agreement no. 691025
Exclude results that are directly or indirectly part of
select count(DISTINCT ?c) as ?Num
where {
{ ?c rdf:type dbo:Settlement . }
UNION
{ ?x dbo:city ?c . }
?c dbo:country dbr:Cyprus .
FILTER NOT EXISTS {
?c dbo:isPartOf+ dbr:Northern_Cyprus .
}
?c dbo:populationTotal ?p .
FILTER ( ?p > 1000 )
}
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 72
The + sign indicates a
path of 1 or more
dbo:isPartOf edges in
the RDF graph
dbo:isPartOf
dbo:isPartOf/dbo:isPartOf
dbo:isPartOf/dbo:isPartOf/dbo:isPartOf
…
H2020 – Grant Agreement no. 691025
• dbr:Nicosia expands to full URI: http://dbpedia.org/resource/Nicosia
‒ Unique ID that represents the resource in the Web of Data
• However, if you type the above URI at a browser, you will be re-directed
at: http://dbpedia.org/page/Nicosia
‒ Manifestation (or visualization) of the resource in the Web of Documents
• If you type at the browser: http://dbpedia.org/data/Nicosia
‒ The browser will retrieve an RDF/XML file that contains all information for Nicosia
‒ All triples with http://dbpedia.org/resource/Nicosia as a subject
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 73
BTW: LOD principles in action
H2020 – Grant Agreement no. 691025
Ontologies, Logic & Rules
Broadening reasoning capabilities
74N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
H2020 – Grant Agreement no. 691025
• Can ontologies express every piece of knowledge needed in the
Semantic Web?
• They can express static information (e.g. knowledge about the world)
• They have some reasoning capabilities, but…
• They cannot combine information from several different individuals in
order to come to a complex conclusion about the world
• They cannot reason about which actions should be performed at each
situation
Ontology shortcomings
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 75
H2020 – Grant Agreement no. 691025
• Consists of logical implications (rules): A1, . . ., An  B
‒ Ai and B are atomic formulas
• There are 2 ways of reading such a rule:
‒ Deductive rules: If A1,..., An are known to be true, then B is also true
‒ Reactive rules: If the conditions A1,..., An are true, then carry out the action B
• Horn logic is tractable and is supported by efficient reasoning tools
Horn Logic: a(nother) Predicate Logic subset
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 76
H2020 – Grant Agreement no. 691025
• Neither of them is a subset of the other
‒ Both are needed in the Semantic Web
• Horn logic example:
‒ Persons who study and live in the same city are “home students”
studies(X,U), lives(X,A), loc(U,C), loc(A,C)  homeStudent(X)
‒ It is impossible to state that in OWL
• OWL example:
‒ A person is either a man or a woman
‒ Easily expressed in OWL using disjoint union
‒ It is impossible to state that in Horn logic
Horn Logic vs. Description Logics (aka ontologies)
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 77
H2020 – Grant Agreement no. 691025
• Horn logic (rules) and description logics (ontologies) are orthogonal
‒ Both are subsets of first-order logic (FOL)
‒ Neither is the subset of each other
Horn logic vs. Description logics
FOL
Horn LogicDescription
logics
OWL 2 RL
SWRL


N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 78
H2020 – Grant Agreement no. 691025
• The simplest integration approach is the intersection of both logics
• OWL 2 RL is an interesting sublanguage of OWL 2 DL
‒ Inherits open-world assumption and non-unique-name assumption
‒ These assumptions do not make a difference
OWL2 RL
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 79
H2020 – Grant Agreement no. 691025
• OWA: We cannot conclude some statement x to be false simply because
we cannot show x to be true.
‒ E.g. a patient’s clinical history does not include a particular allergy
‒ It would be incorrect to assume that the patient does not suffer from that allergy
‒ It is unknown, unless more information is given
• CWA: If something cannot be proved, then it is false
‒ E.g. we are looking for a direct flight between Larnaca and Madrid in a DB
application for airline reservations
‒ The flight doesn’t exist in the database
‒ Expected / correct answer: “There is no direct flight between Austin and Madrid.”
• OWL is committed to OWA, Horn logic to CWA
Open-World Assumption (OWA) vs.
Closed-World Assumption (CWA)
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 80
H2020 – Grant Agreement no. 691025
• Statement: “Juan is a citizen of the USA.”
• Question: “Is Juan a citizen of Colombia?”
‒ CWA answer: no
‒ OWA answer: I don’t know
• Additional statements:
‒ “A person can only be citizen of one country”
‒ “Juan is a citizen of Colombia.”
• CWA: error (we assume that Colombia and USA are different things)
• OWA: “USA and Colombia must be the same thing”
OWA vs. CWA
Examples
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 81
H2020 – Grant Agreement no. 691025
• When two individuals are known by different names, they are in fact
different individuals.
‒ Sometimes works well and sometimes not
‒ Example in favor: when two products in a catalog are known by different codes, they
are different
‒ Example against: two people in a social environment initially known with different
identifiers (e.g., “Prof. van Harmelen” and “Frank”) are sometimes the same person
• CWA systems (Horn logic) have UNA
• OWA systems (OWL) do not have UNA
‒ However, one could manually add the UNA
‒ Using owl:allDifferent or owl:differentFrom
Unique-Name Αssumption (UNA)
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 82
H2020 – Grant Agreement no. 691025
• OWL 2 RL is the largest fragment of OWL on which the choice for
CWA/OWA and UNA does not matter
‒ Weak enough so that differences between choices don’t show up.
‒ Still large enough to enable useful representation and reasoning tasks.
• Constructs of OWL that can be expressed using Horn logic rules
‒ Subclass, sub-property, class and property equivalence
‒ Equality-inequality between individuals
‒ Inverse, transitive, symmetric and functional properties
‒ Intersection of classes
• Excluded constructors
‒ Union, existential quantification, and arbitrary cardinality constraints
OWL 2 RL
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 83
H2020 – Grant Agreement no. 691025
• A triple s p o is expressed as a fact p(s, o)
• An instance declaration rdf:type(a,C)
‒ a is an instance of class C
‒ expressed as C(a)
• C is a subclass of D: C(X) → D(X)
• P is a sub-property of Q: P(X,Y) → Q(X,Y)
• Domain and Range Restrictions
‒ D is the domain of property P: P(X, Y) → D(X)
‒ R is the range of property P: P(X, Y) → R(Y)
RDF constructs
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 84
H2020 – Grant Agreement no. 691025
• equivalentClass(C,D): C(X) → D(X), D(X) → C(X)
• equivalentProperty(P,Q): P(X,Y) → Q(X,Y), Q(X,Y) → P(X,Y)
• Transitive Properties: P(X,Y), P(Y,Z) → P(X,Z)
• allValuesFrom(P,D): C(X), P(X,Y) → D(Y)
‒ Necessary restriction (e.g. all undergraduate students must attend only
undergraduate courses)
‒ UGStudent(X), attends(X,Y) → UGCourse(Y)
• someValuesFrom(P,D): P(X,Y), D(Y) → C(X)
‒ Sufficient restriction / Instance classification rule (e.g. if someone attends a
postgraduate course, then he/she is a postgraduate student)
‒ attends(X,Y), PGCourse(Y) → PGStudent(X)
OWL Constructs
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 85
H2020 – Grant Agreement no. 691025
• A proposed Semantic Web language combining OWL 2 DL with Horn logic
‒ Syntax: Datalog RuleML
• Allows the definition of Horn-logic rules on top of OWL 2 DL ontologies
‒ Rule conclusions are “stored” back in the ontology
• SWRL unites the expressivities of DL and Horn-logic
‒ OWL 2 RL combines the advantages of both languages in their common
sublanguage
• SWRL is intractable
‒ DL-safe rules: tractable subset of SWRL
→Every variable must appear in a non-DL atom in the rule body
Semantic Web Rules Language (SWRL)
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 86
H2020 – Grant Agreement no. 691025
Man(?m) → Person(?m)
‒ Possible in OWL - subclassOf relation
‒ Some rules are OWL syntactic sugar
Person(?m)  hasSex(?m,male) → Man(?m)
‒ Possible in OWL – hasValue (sufficient) restriction
‒ Not all such reclassifications are possible in OWL
Person(?m)  hasSpouse(?m,?w)  works_at(?w,?j)  publicOrg(?j) →
MarriedToPublicServantPerson(?m)
‒ Not possible in OWL
Example SWRL Rules: Reclassification
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 87
H2020 – Grant Agreement no. 691025
hasParent(?x, ?y)  hasBrother(?y, ?z) → hasUncle(?x, ?z)
‒ Property chaining
‒ Possible in OWL 2 - Not possible in OWL 1
Person(?p)  hasSibling(?p,?s)  Man(?s) → hasBrother(?p,?s)
‒ Not possible in OWL
Publication(?p)  hasAuthor(?p,?y)  hasAuthor(?p,?z)  differentFrom(?y,?z)
→ cooperatedWith(?y, ?z)
‒ SWRL does not adopt the UNA
‒ Individuals must also be explicitly stated to be different (using owl:allDifferent
restriction) in the OWL ontology
Example SWRL Rules: Property Value Assignment
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 88
H2020 – Grant Agreement no. 691025
• Built-ins dramatically increase expressivity
‒ Most rules are not expressible in OWL 1
‒ Some built-ins can be expressed in OWL 2
Person(?p)  hasAge(?p,?age)  swrlb:greaterThan(?age,17) → Adult(?p)
Person(?p)  hasNumber(?p, ?number)  swrlb:startsWith(?number, "+") →
hasInternationalNumber(?p, true)
Person(?p)  hasSalaryInPounds(?p, ?gbp)  swrlb:multiply(?d, ?gbp, 2) →
hasSalaryInDollars(?p, ?dollars)
Example SWRL Rules: Built-ins
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 89
H2020 – Grant Agreement no. 691025
Person(?p)  not hasCar(?p, ?c) → CarlessPerson(?p)
• Not possible – rule language does not support negation
• Potential invalidation - what if a person later gets a car?
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 5-90
SWRL is Monotonic: does not Support Negation
H2020 – Grant Agreement no. 691025
• De-facto industry standard to represent SPARQL rules and constraints
on Semantic Web models.
• Provides meta-modeling capabilities that allow users to define their
own SPARQL functions and query templates.
• Includes a ready to use library of common functions.
• SPIN follows the CWA
‒ A special kind on negation exists (negation-as-failure)
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 91
SPIN (SPARQL Inferencing Notation)
H2020 – Grant Agreement no. 691025
• Rules can be expressed in SPARQL using the CONSTRUCT feature
parent(X,Y), parent(Y,Z)  grandparent(X,Z)
CONSTRUCT {
?X grandParent ?Z .
}
WHERE {
?X rdf:type Person .
?X parent ?Y .
?Y parent ?Z .
}
Rules in SPARQL: SPIN
optional
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 92
H2020 – Grant Agreement no. 691025
• Rules can be associated to classes
‒ Rules can represent behavior of the instances of that class (an OOP feature!)
‒ Even constructors can be define!
‒ Global rules can also be defined
• SPIN variants
‒ Inference / Entailment Rules: CONSTRUCT, INSERT
‒ Production Rules: DELETE
‒ Integrity constraints: ASK
SPIN
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 93
H2020 – Grant Agreement no. 691025
• The grandParent rule is stored at the Person class
‒ It will be executed only for instances of this class (and subclasses)
‒ Increases accuracy and efficiency
CONSTRUCT {
?this grandParent ?Z .
}
WHERE {
?this parent ?Y .
?Y parent ?Z .
}
Rules in SPARQL: SPIN
?this means an instance of the
class, where the rule is stored
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 94
H2020 – Grant Agreement no. 691025
• Constraint are expressed via the ASK SPARQL construct
• For each instance that the ASK query is true, then the constraint is
violated
‒ A constraint violation warning is issued
ASK WHERE {
?this hasAge ?age .
FILTER (?age < 18) .
}
SPIN Constraints
The SPIN constraint is
stored at class Student
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 95
H2020 – Grant Agreement no. 691025
• A more radical solution to constraint violation would be to delete the
instance that violates the constraint
‒ This can be done via the DELETE construct
‒ The SPIN rules should be declared as a constructor
‒ Constructor rules run each time a new instance is created
DELETE { ?this rdf:type Student . }
WHERE {
?this hasAge ?age .
FILTER (?age < 18) .
}
SPIN Constructors
The SPIN constructor is
stored at class Student
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 96
H2020 – Grant Agreement no. 691025
• Unlike SWRL, in SPIN there is a special negation (negation-as-failure)
‒ When something is not found to be true, it is false, thus its negation is true
Person(?p)  not hasCar(?p, ?c) → CarlessPerson(?p)
CONSTRUCT {
?this rdf:type CarlessPerson .
}
WHERE {
?this rdf:type Person .
FILTER NOT EXISTS { ?this hasCar ?x . }
}
Using negation in SPIN
If we store the SPIN rule at class
Person we do not need this line
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 97
H2020 – Grant Agreement no. 691025
C1(X), equivalentClass(C1,C2)  C2(X)
CONSTRUCT {
?X a ?C2 .
}
WHERE {
?X a ?C1.
?C1 equivalentClass ?C2.
}
OWL 2 RL rules in SPIN
Actually, in some Semantic Web
systems, the OWL 2 RL semantics
are implemented via SPIN rules
(TopBraid Composer)
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 98
H2020 – Grant Agreement no. 691025
A Semantic Entity Linking
Use Case
The URank system
99N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
URank
Entity
Extractor
Site-specific
transformations
Extraction
Rules
Extracted
Data
Ranking
sites
Entity
Linker
Ranking
datasets
Merged
dataset
Entity
Merger
Domain-
specific
filtering
Ranking
ontology
H2020 – Grant Agreement no. 691025
• University rankings
‒ Means of advertisement
‒ There are so many of them! (>20 global rankings)
‒ Do we need all of them? Are they similar? Are they robust?
‒ Comparative Statistical Analysis is needed
• Collecting the data from multiple web sites
‒ Web data extraction
‒ Each ranking site produces a ranking table
‒ A single table needs to be constructed to feed the Statisticians
‒ The ranking tables need to be merged into a single all-rankings table
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 100
Motivating example
H2020 – Grant Agreement no. 691025
Name Rank …
… … …
Aristotle University of Thessaloniki 491-500
… … …
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 101
The easy case…
QS
Name Rank …
… … …
Aristotle University of Thessaloniki 401-500
… … …
THE
…
Name QS THE …
… … … …
Aristotle University of Thessaloniki 491-500 401-500 …
… … … …
Merged table
H2020 – Grant Agreement no. 691025
Name Rank …
… … …
The Imperial College of Science,
Technology and Medicine
22
… … …
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 102
The difficult case…
ARWU
Name Rank …
… … …
Imperial College London 8
… … …
THE
…
Name ARWU THE …
… … … …
The Imperial College of Science,
Technology and Medicine
22 - …
Imperial College London - 8 …
… … … …
Merged table
Levenshtein Distance = 38
Substring similarity* = 0.605
*A string metric for ontology alignment by Giorgos Stoilos, 2005.
H2020 – Grant Agreement no. 691025
• Use string matching and semantic search to find a unique ID for each
university in each ranking table
• Where to search? DBpedia
‒ DBpedia / Wikipedia have entries for almost all World Universities
‒ Each DBpedia entity has an ID
• Hopefully different variations of the University name will retrieve the
same University entity in DBpedia
• Then merging is straight forward
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 103
Merging tables: Solution
H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 104
URank[1] System
URank
Entity
Extractor
Site-specific
transformations
Extraction
Rules
Extracted
Data
Ranking
sites
Entity
Linker
Ranking
datasets
Merged
dataset
Entity
Merger
Domain-
specific
filtering
Ranking
ontology
[1] N. Bassiliades: “Collecting University Rankings for
Comparison Using Web Extraction and Entity Linking
Techniques”, Springer CCIS Vol.469, pp.23-46, 2014.
A Prolog application that:
1. Extracts data from
ranking sites
2. Links Universities to
DBpedia entities
through semantic
search and string
similarity
3. Generates merged table
H2020 – Grant Agreement no. 691025
• Entity linking: the task of determining the identity of entities mentioned
in text
• Different from Named Entity Recognition, which identifies the
occurrence or mention of a named entity in text but it does not
identify which specific entity it is.
• Entity linking requires a knowledge base containing the entities to
which entity mentions can be linked.
• In URank:
‒ Named Entity Recognition is not needed - All University names are entities
‒ Entity Linking uses DBpedia as a knowledge base
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 105
Entity Linking
H2020 – Grant Agreement no. 691025
• DBpedia University entities do not always belong to the correct class
‒ dbo:University or dbo:EducationalInstitution
‒ Need to loose that criterion carefully
• University mergers /splits
‒ University of Paris split in 1970 into 13 Universities named with very similar
names: University of Paris I, II, …
‒ University of Montpellier split into three universities (I, II, II) in 1970; I and II
merged back in 2015
‒ Need to check if the University is currently operating
• Newcastle University, UK vs University of Newcastle, Australia
‒ Need to check the country
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 106
Entity Linking challenges
H2020 – Grant Agreement no. 691025
• US vs USA
‒ Need to have a common representation for countries
• Imperial College London vs Imperial College of Science, Technology and
Medicine
‒ Need to have access to alternative University names
• University of Montpellier II vs University of Montpellier 2
‒ Need to convert between Arabic and roman literals
• Universität vs University
‒ Need to translate between different languages
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 107
Entity Linking challenges
H2020 – Grant Agreement no. 691025
• DBpedia Spotlight: annotate mentions of DBpedia resources in NL text
‒ http://demo.dbpedia-spotlight.org/
‒ Have tried it for the University Ranking use case with ~86% F-measure
• Silk: integrate heterogeneous data sources
‒ http://silkframework.org/
‒ Generates links between related data items within different Linked Data sources.
‒ Linked Data publishers can use Silk to set RDF links from their data sources to other
data sources on the Web.
‒ Experimentation for the use case has even lower F-measure
• Domain-specific knowledge must be used to face all challenges
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 108
General-purpose Entity Linking Tools
H2020 – Grant Agreement no. 691025
• Each University entity at each ranking site is matched against a
DBpedia entry using 3 alternative methods
‒ DBpedia lookup service
‒ DBpedia SPARQL endpoint, using approximate string matching filtering
functions
‒ Keyword search engine of Wikipedia
• At each step, if a satisfactory match is found (substring matching), the
algorithm terminates
• Otherwise, all matching entries are collected and scored
‒ Top scored candidate is returned as a match
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 109
Entity Linking in URank
H2020 – Grant Agreement no. 691025
• String distance is measured using a metric for ontology alignment [1]
‒ Concerns substring matching
‒ More appropriate for matching names of Universities, than e.g. Levenshtein
‒ E.g. “Imperial College” vs. “Imperial College of Science, Technology and Medicine”
‒ Materialized as a built-in predicate in SWI-Prolog (isub/4)
• Satisfactory match is above a high similarity threshold
‒  0,97 depending on the algorithm step
[1] Stoilos, G., Stamou, G., Kollias, S.: A String Metric for Ontology Alignment. ISWC 2005, LNCS, vol.
3729, pp. 624-637. Springer (2005)
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 110
String matching
H2020 – Grant Agreement no. 691025
http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?Q
ueryClass=University&QueryString=Imperial%20College%20Lo
ndon
• RESTful API parameters:
‒ QueryString: a string for which a DBpedia URI
should be found (University name)
‒ QueryClass: a DBpedia class from the Ontology
that the results should have (University)
‒ MaxHits: the maximum number of returned
results (default: 5)
• Results in XML should be parsed
‒ using native SWI-Prolog XPath built-ins
111
DBpedia lookup service
N. Bassiliades, ENCASE Summer School, Limassol, 17th July
2017
H2020 – Grant Agreement no. 691025
• If DBpedia lookup does not return a satisfactory match then the DBpedia
SPARQL endpoint is used (provided by OpenLink Virtuoso RDF DB engine)
SELECT ?univ, ?name WHERE {
?univ rdf:type Class .
?univ ?property ?val .
?val bif:contains University.Name.Words option (score ?score) .
?univ rdfs:label ?name .
FILTER (lang(?name) = "en") }
ORDER BY DESC (?score*0.3+sql: rnk_scale(<LONG::IRI_RANK> (?univ)))
LIMIT Top-N2
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 112
DBpedia SPARQL endpoint (http://dbpedia.org/sparql)
H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 113
SPARQL query example @ DBpedia
Results are actually returned in RDF,
not HTML
H2020 – Grant Agreement no. 691025
• If neither native “DBpedia methods” return a satisfactory match, the
Wikipedia search engine is used
http://en.wikipedia.org/w/index.php?search=Univ.name&limit=Top-N3&go=Go
• Results are verified using equivalent DBpedia entity
‒ Wikipedia URLs are uniquely transformed to DBpedia URIs
‒ https://en.wikipedia.org/wiki/Imperial_College_London 
http://dbpedia.org/resource/Imperial_College_London
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 114
Keyword search engine of Wikipedia
H2020 – Grant Agreement no. 691025
• Spatiotemporal constraints are used for filtering
‒ Not easy, due to incomplete and inaccurate information from DBpedia/Wikipedia
• Retrieved university must be located in the same country with the extracted
university
‒ Using dbo:country property (not present in all entities)
‒ Using alternative properties dbo:state, dbo:city, dbo:location, and try to locate
country (geographical areas containment)
‒ Using redirection to another entity (owl:sameAs, dbo:wikiPageRedirects)
• Countries are not always represented using the same name
‒ E.g. US, U.S., USA, U.S.A., United States of America
‒ E.g. UK, United Kingdom, United Kingdom of Great Britain and Northern Ireland
‒ Synonym matrix for most problematic cases
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 115
Spatiotemporal constraints
H2020 – Grant Agreement no. 691025
• select ?x where { <DBPediaURL> dbo:country ?x . }
• select ?x where { <DBPediaURL> dbo:state ?x . }
• select ?x where { <DBPediaURL> dbo:city ?x . }
• select ?x where { <DBPediaURL> dbo:location ?x . }
• …
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 116
Geographical area containment
H2020 – Grant Agreement no. 691025
• Retrieved university must be still operating
‒ E.g. University of Paris split, University of Montpellier split and merge, etc.
• Checking if the property dbp:closed exists
‒ Not all “closed” Universities have this property
‒ In this case heuristic scrapping
from Wikipedia entries is performed
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 117
Spatiotemporal constraints
H2020 – Grant Agreement no. 691025
Conclusions
118N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
H2020 – Grant Agreement no. 691025
• Semantic technologies and techniques – not necessarily using Semantic Web
standards – have been adopted for building and using knowledge graphs
‒ Google, Facebook, Yandex, Baidu, Bing have embedded Semantic technologies into
their core businesses
‒ The presentage of Web content that uses microdata/RDFa/schema.org information to
enhance search results reached double digits
‒ IBM’s Watson (AlchemyLanguage) returns concept information as
DBpedia/yago/freemix URIs, including a Linked Data API
• Semantic technologies are rapidly finding their ways into consumer products
‒ The average developer barely realizes it, but semantics are now just about everywhere
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 119
The Future of Semantic Technologies
H2020 – Grant Agreement no. 691025
• Semantic Web is a branch of AI
• The market interest and justification is in solving problems
‒ AI has problem solving in its core
• Semantic technologies will become an essential enabler to the
development of true knowledge-based AI
‒ What things mean and how people understand what they mean
• AI applications currently focus on Machine Learning (ML) and NLP
‒ Knowledge graphs can help enhance the accuracy of ML/NLP
‒ NLP: top-down processing (use ontology terms to disambiguate text)
‒ ML: bottom-up processing (learn from data and match to generalized concepts)
120
Semantic Web and Artificial Intelligence
N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 121
Enhancing seCurity and privAcy in the Social wEb: a user-centered
approach for the protection of minors

Mais conteúdo relacionado

Mais procurados

Envisioning Social Applications of Library Linked Data
Envisioning Social Applications of Library Linked DataEnvisioning Social Applications of Library Linked Data
Envisioning Social Applications of Library Linked DataUldis Bojars
 
IFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked DataIFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked DataLars G. Svensson
 
Exploring the Networks in Open Public Data
Exploring the Networks in Open Public DataExploring the Networks in Open Public Data
Exploring the Networks in Open Public DataUldis Bojars
 
Legal Ethics And Professional Conduct
Legal Ethics And Professional ConductLegal Ethics And Professional Conduct
Legal Ethics And Professional Conductlegalinfo
 
AAC Linked Data Planning: Perspectives and Considerations
AAC Linked Data Planning: Perspectives and ConsiderationsAAC Linked Data Planning: Perspectives and Considerations
AAC Linked Data Planning: Perspectives and ConsiderationsDesign for Context
 
Aallbibframe em-20130714
Aallbibframe em-20130714Aallbibframe em-20130714
Aallbibframe em-20130714zepheiraorg
 
They have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersThey have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersRichard Wallis
 
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]Teodora Petkova
 
Web at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open DataWeb at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open DataAI4BD GmbH
 
Semantic MediaWiki - Knowledge Management and Open Data Use Cases
Semantic MediaWiki - Knowledge Management and Open Data Use CasesSemantic MediaWiki - Knowledge Management and Open Data Use Cases
Semantic MediaWiki - Knowledge Management and Open Data Use CasesBernhard Krabina
 
Cosi Usage Data
Cosi   Usage DataCosi   Usage Data
Cosi Usage Datadaveyp
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
 
The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about itHerbert Van de Sompel
 
Where does the born- and reborn-digital material take the Digital Humanities?
Where does the born- and reborn-digital material take the Digital Humanities?Where does the born- and reborn-digital material take the Digital Humanities?
Where does the born- and reborn-digital material take the Digital Humanities?UCLDH
 

Mais procurados (20)

Envisioning Social Applications of Library Linked Data
Envisioning Social Applications of Library Linked DataEnvisioning Social Applications of Library Linked Data
Envisioning Social Applications of Library Linked Data
 
IFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked DataIFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked Data
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Exploring the Networks in Open Public Data
Exploring the Networks in Open Public DataExploring the Networks in Open Public Data
Exploring the Networks in Open Public Data
 
Legal Ethics And Professional Conduct
Legal Ethics And Professional ConductLegal Ethics And Professional Conduct
Legal Ethics And Professional Conduct
 
The Danish National Bibliography as LOD
The Danish National Bibliography as LODThe Danish National Bibliography as LOD
The Danish National Bibliography as LOD
 
AAC Linked Data Planning: Perspectives and Considerations
AAC Linked Data Planning: Perspectives and ConsiderationsAAC Linked Data Planning: Perspectives and Considerations
AAC Linked Data Planning: Perspectives and Considerations
 
Aallbibframe em-20130714
Aallbibframe em-20130714Aallbibframe em-20130714
Aallbibframe em-20130714
 
They have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersThey have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library Users
 
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]
 
Web at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open DataWeb at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open Data
 
3e Studiedag Webarchivering - Promise
3e Studiedag Webarchivering - Promise3e Studiedag Webarchivering - Promise
3e Studiedag Webarchivering - Promise
 
Semantic MediaWiki - Knowledge Management and Open Data Use Cases
Semantic MediaWiki - Knowledge Management and Open Data Use CasesSemantic MediaWiki - Knowledge Management and Open Data Use Cases
Semantic MediaWiki - Knowledge Management and Open Data Use Cases
 
Cosi Usage Data
Cosi   Usage DataCosi   Usage Data
Cosi Usage Data
 
GLAMorous LOD
GLAMorous LODGLAMorous LOD
GLAMorous LOD
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about it
 
Unlocking Doors: recent initiatives in open and linked data at the National L...
Unlocking Doors: recent initiatives in open and linked data at the National L...Unlocking Doors: recent initiatives in open and linked data at the National L...
Unlocking Doors: recent initiatives in open and linked data at the National L...
 
Edina cigs-21-september-2012
Edina cigs-21-september-2012Edina cigs-21-september-2012
Edina cigs-21-september-2012
 
Where does the born- and reborn-digital material take the Digital Humanities?
Where does the born- and reborn-digital material take the Digital Humanities?Where does the born- and reborn-digital material take the Digital Humanities?
Where does the born- and reborn-digital material take the Digital Humanities?
 

Semelhante a Semantic Technologies for the Web of Linked Data

Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectPRELIDA Project
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked dataEnno Meijers
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage informationsemanticsconference
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamEnno Meijers
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesAlessandro Adamou
 
A distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaA distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaEnno Meijers
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEnno Meijers
 
Linked Data: so what?
Linked Data: so what?Linked Data: so what?
Linked Data: so what?MIUR
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Alexandre Passant
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data WebData Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data WebJohn Breslin
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Anita de Waard
 
DataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic WebDataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic WebJohn Breslin
 

Semelhante a Semantic Technologies for the Web of Linked Data (20)

Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA project
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Aggregation as tactic sm new
Aggregation as tactic sm newAggregation as tactic sm new
Aggregation as tactic sm new
 
Aggregation as Tactic
Aggregation as TacticAggregation as Tactic
Aggregation as Tactic
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
A distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaA distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL India
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage information
 
Linked Data: so what?
Linked Data: so what?Linked Data: so what?
Linked Data: so what?
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data WebData Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 
DataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic WebDataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
 

Último

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Último (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Semantic Technologies for the Web of Linked Data

  • 1. Nick Bassiliades, Aristotle University of Thessaloniki Enhancing seCurity and privAcy in the Social wEb: a user-centered approach for the protection of minors Funded by the Horizon H2020 Framework Programme of the European Union under grant agreement no 691025. Semantic Technologies for the Web of Linked Data ENCASE Summer School Limassol, 17th July 2017
  • 2. H2020 – Grant Agreement no. 691025 • Nick Bassiliades (Νικόλαος Βασιλειάδης) ‒ http://intelligence.csd.auth.gr/people/bassiliades • Associate Professor, Department of Informatics, Aristotle University of Thessaloniki, Greece • Scientific specialization: Knowledge Systems ‒ Knowledge Representation & Reasoning (Rule-based systems, Logic programming, Defeasible Reasoning, Knowledge-based systems / expert systems) ‒ Semantic Web (Ontologies, Linked Open Data, Semantic Web Services) ‒ Multi-agent systems ‒ Intelligent Applications on e-Learning, e-Government, e-Commerce, Electric Vehicles N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 2 A few words about the speaker
  • 3. H2020 – Grant Agreement no. 691025 • Aristotle University of Thessaloniki, Greece ‒ Largest University in Greece and South-East Europe ‒ Since 1925, 41 Departments, ~2K faculty, ~45K students • Dept. of Informatics ‒ Since 1992, 28 faculty, 5 research labs, ~1100 undergraduate students, ~200 MSc students, ~80 PhD students, ~120 PhD graduates, >3500 pubs • Software Engineering, Web and Intelligent Systems Lab ‒ 7 faculty, 20 PhD students, 9 Post-doctorate affiliates • Intelligent Systems group (http://intelligence.csd.auth.gr) ‒ 4 faculty, 7 PhD students, 17 PhD graduates ‒ Research on Artificial Intelligence, Machine Learning / Data Mining, Knowledge Representation & Reasoning / Semantic Web, Planning, Multi-Agent Systems ‒ 430 publications, 35 projects N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 3 A few words about my institution
  • 4. H2020 – Grant Agreement no. 691025 • Semantic Web & Linked Open Data ‒ Introductory Concepts ‒ RDF, RDF Schema, Ontologies, OWL, Reasoning • Storing and Querying RDF ‒ SPARQL, DBpedia • Ontologies, Logic & Rules ‒ Horn Logic, OWL 2 RL, SWRL, SPIN • Use Case: (Semantic) Entity Identification / Linking ‒ URank application (Prolog) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 4 A few words about the talk
  • 5. H2020 – Grant Agreement no. 691025 Semantic Web & Linked Open Data RDF, RDF Schema & OWL N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 5
  • 6. H2020 – Grant Agreement no. 691025 • A shift from today’s Web, from publishing data in human readable HTML documents to machine readable documents. • Today, much of the data we get from the web is delivered to us in the form of web pages ‒ HTML documents that are linked to each other through the use of hyperlinks • Humans or machines can read (and browse/crawl) these documents • Machines can seek keywords in a page • Machines have difficulty extracting any meaning from these documents themselves N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 6 The Semantic Web
  • 7. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 7 Web evolution https://atomate.net/blog/web-technologies-now-and-tomorrow/
  • 8. H2020 – Grant Agreement no. 691025 • An extension of the Web through standard data formats and exchange protocols ‒ Most important: RDF, OWL, SPARQL, RIF, SWRL, SPIN, … • SW provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries in the web. “… a web of data that can be processed by machines”. Tim Berners-Lee N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 8 The Semantic Web
  • 9. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 9 From the Web of Documents to the Web of Data https://atomate.net/blog/web-technologies-now-and-tomorrow/
  • 10. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 10 Web of (Linked) Data Current Web Semantic Web Web of Documents Web of Data Hypertext Documents (HTML) Semi-structured Data representation in graphs (RDF) Interconnected documents through URL links Linked data through URIs Human Consumption Machine (and human) Consumption Use search engines and browsing to explore Use search engines, web (RDF) databases to query and URI links between datasets to explore Don't just link the documents, link the things
  • 11. H2020 – Grant Agreement no. 691025 • Some data should be freely available to everyone to use and republish, without restrictions from copyright, patents or other mechanisms of control. ‒ Similar to open source, hardware, content and access. ‒ Benefits: Transparency and democratic control, Improved or new private products and services, Improved government services, New knowledge from combined data sources and patterns in large data volumes • Open data must be available in a convenient and modifiable form. • Interoperability: the ability to combine different datasets together and to develop more and better products and services N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 11 Open Data
  • 12. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 12 XLS DOC PDF
  • 13. H2020 – Grant Agreement no. 691025 • But, how exactly should open data be published (technological side)? ‒ In order to be easily re-used by other people and be interoperable? • Tim Berners-Lee, the inventor of the Web and Linked Data initiator, suggested a 5-star deployment scheme for Open Data. N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 13 Open Data Formats & Levels
  • 14. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 14 5 ★ OPEN DATA
  • 15. H2020 – Grant Agreement no. 691025 • 3 ★ OPEN DATA (CSV): ‒ The data is available via the Web; everyone can use the data easily, with no proprietary software ‒ But, it’s still data on the Web and not data in the Web. • 4 ★ OPEN DATA (RDF): ‒ Data items have a URI and can be shared / bookmarked on the Web. ‒ You can reuse parts of the data. ‒ Data can be stored in RDF databases (triple stores) and can be queried via public endpoints on the Web through remote clients (SPARQL protocol) ‒ You can combine the data safely with other data. URIs are a global scheme. N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 15 Going up the ladder
  • 16. H2020 – Grant Agreement no. 691025 • 5 ★ OPEN DATA (LOD): ‒ Data in the Web (RDF), linked to other data in the Web (URI links via owl:sameAs). ‒ Both the consumer and the publisher benefit from the network effect. Publisher adds value to data by linking them to other data with more information. Consumer can discover more (related) data while consuming the data. Consumer can directly learn about the data schema. Publisher needs to invest resources to link data to other data in the Web. Publisher may need to repair broken or incorrect links. Consumer trusts the consumed data, but what about the data from external links? N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 16 Linked Open Data
  • 17. H2020 – Grant Agreement no. 691025 • Use URIs as names for “things”, so that names are globally unique (IDs) ‒ Real inanimate or animate things, abstract concepts, … • Use HTTP URIs, so that “things” can be looked up ‒ E.g. using a browser • When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) • Include links to other URIs so that more things can be discovered ‒ Links are actually RDF properties interpreted as hyperlinks N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 17 Linked Data Principles
  • 18. H2020 – Grant Agreement no. 691025 • URIs – A mechanism to identify things (IDs) • HTTP – A mechanism to access things • RDF – A mechanism to describe things and their relationships • RDFS/OWL – A mechanism to describe vocabularies of properties and relationships of things N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 18 Linked Data Technology Stack
  • 19. H2020 – Grant Agreement no. 691025 • A data model (framework) for representing information (describing resources) in the (Semantic) Web • Compared to other data models, RDF is based on graphs ‒ Graphs have nodes and edges In RDF graphs: • Nodes are resources: ‒ Webized entities, i.e. “things” / objects that we want to talk about in the web ‒ Literals, i.e. constant atomic values of various data types • Edges are properties (also called predicates): ‒ Relationships between entities ‒ Attributes of an entity, linking it with attribute values N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 19 Resource Description Framework (RDF)
  • 20. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 20 RDF graph example http://lpis.csd.auth.gr http://www.csd.auth.gr/bassiliades http://acm.org/SemanticWeb97913 has-admin has-phone works-on … … …
  • 21. H2020 – Grant Agreement no. 691025 • The RDF graph consists of many simple node  edge  node parts ‒ Called “triples” • The triple can be “read” as a natural language “statement” ‒ “Nick” “has phone” “97913” • The three parts of the triple have different names ‒ Syntactical terms N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 21 Triples / Statements http://.../bassiliades 97913 has-phone
  • 22. H2020 – Grant Agreement no. 691025 • Tabular data: (SQL Databases, Excel, CSV, etc.) ‒ Information arranged in a strict grid. ‒ Adding / removing data is easy ‒ Changing the shape of table is a much higher cost. • Tree data: (JSON, XML) ‒ Loose structure / can represent easier semi- and un- structured information ‒ Tricky to modify the structure and merge data from multiple sources, especially if those sources were not designed with the merger in mind. • Graph data: (RDF) ‒ A set of relationships (triples) between things – it can be any shape - more flexible. ‒ Merging 2 RDF documents is trivial. (Union of two sets!) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 22 Why graphs?
  • 23. H2020 – Grant Agreement no. 691025 • RDF uses globally unique identifiers (URIs) for everything ‒ Things we're talking about (resources) ‒ Relationships (properties, predicates) ‒ Datatypes (literals) • RDF is a list of unambiguous relationships ‒ Merging / combination of two graphs is trivial N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 23 Global Identifiers
  • 24. H2020 – Grant Agreement no. 691025 • URI: Universal Resource Indicator - identifies something uniquely. ‒ A URI represents a single concept or thing ‒ Many URIs can represent the same thing ‒ If you resolve a URI it's considered good practice to return some useful triples about the concept the URI represents (optional) • URL: Universal Resource Location - not only identifies something, but also describes where it is located. • All URLs are URIs. Not all URIs are URLs. • Example ‒ <http://dbpedia.org/resource/Julius_Caesar>: URI for Julius Caesar. ‒ <http://en.wikipedia.org/wiki/Julius_Caesar>: URL for a web page about Julius Caesar N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 24 URI vs URL
  • 25. H2020 – Grant Agreement no. 691025 • There are several ways of writing RDF triples into a file. • RDF+XML • Turtle (and N3) ‒ N-triples • RDFa ‒ Embed triples into an HTML document ‒ Triples can be extracted from the web page by tools (e.g. pyRdfa) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 25 RDF Documents
  • 26. H2020 – Grant Agreement no. 691025 • For URIs it's common to define related concepts in the same namespace • A namespace is like a directory on a filesystem ‒ All the contained “things” share the same “global” address ‒ The “full address” ends with either "/" and "#" ‒ The local names (IDs) of “things” don't have "/" or "#“ ‒ The global names (IDs) of “things” are constructed by namespace:local_name • In RDF it's common to use a namespace prefix to make things readable N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 26 Namespaces
  • 27. H2020 – Grant Agreement no. 691025 • In usual graphs (Graph Theory), all nodes and edges are of the same type (e.g. cities and roads) • In the RDF model, nodes can belong to different types and all edges are labelled, so they have different meaning • Semantic Networks (semantic nets) (known from AI) • How do we know which types of nodes we can use? ‒ What’s in a type? • What edges' labels (properties) can we use? ‒ Are they related to node types? N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 27 Semantic Networks Web page Researcher Research Topic integer
  • 28. H2020 – Grant Agreement no. 691025 • The term ontology originates from philosophy ‒ The study of the nature of existence • Different meaning from computer science ‒ An ontology is an explicit and formal specification of a conceptualization • An ontology is an artifact (1-2-3 ontologies) • An ontology in the Semantic Web is a formally defined vocabulary that dictates what node types are there and what properties relate which node types ‒ And other things, as well N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 28 Ontologies
  • 29. H2020 – Grant Agreement no. 691025 • Terms denote important concepts (classes of objects) of the domain ‒ e.g. professors, staff, students, courses, departments • Relationships between these terms: typically class hierarchies ‒ a class C is a subclass of another class B if every object in C is also included in B ‒ e.g. all professors are staff members N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 29 Typical Components of Ontologies
  • 30. H2020 – Grant Agreement no. 691025 • Properties / attributes: ‒ X’s phone number is 98713 • (direct) Relationships ‒ X teaches Y • Relationship restrictions ‒ Faculty members can teach courses • Disjointness statements ‒ faculty and general staff are disjoint • Other restrictions on relationships between objects ‒ every department must include at least 10 faculty members N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 30 Further Components of Ontologies
  • 31. H2020 – Grant Agreement no. 691025 • Ontologies provide a shared understanding of a domain: ‒ semantic interoperability ‒ overcome differences in terminology ‒ mappings between ontologies • Ontologies are useful for the organization and navigation of Web sites • Ontologies are useful for improving the accuracy of Web searches ‒ Search engines can look for pages that refer to a precise concept in an ontology ‒ If a query fails to find relevant documents, the search engine may suggest a more general query ‒ If too many answers are retrieved, the search engine may suggest specializations The Role of Ontologies on the Web N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 31
  • 32. H2020 – Grant Agreement no. 691025 • The most important feature of ontologies is the reasoning support ‒ Reasoning: applying knowledge to arrive at solutions ‒ Inferencing: deriving a conclusion based on statements that only imply that conclusion • Reasoning/inferencing support is important for: ‒ Checking consistency of ontology and knowledge ‒ Checking for unintended relationships between classes ‒ Automatically classifying instances in classes ‒ Deriving information/knowledge not known before Ontologies and Reasoning N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 32
  • 33. H2020 – Grant Agreement no. 691025 • RDF is a generic data model for describing objects, their properties and relations between them • RDF Schema is a simple vocabulary (ontology) description language • In RDF Schema, we can define: ‒ “Allowed” Classes (node types) and Properties (edges’ labels) ‒ Which properties go with which classes ‒ Which values a property can take ‒ Class Hierarchies and Inheritance ‒ Property Hierarchies! N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 33 Semantic Web Ontology Languages RDF Schema
  • 34. H2020 – Grant Agreement no. 691025 • A richer ontology language • (More) relations between classes and properties ‒ e.g., equivalent classes/properties, disjoint classes/properties, inverse properties, Boolean combinations of classes • Cardinality constraints ‒ e.g. People have exactly one mother, Students attend at most five courses • Richer typing / restrictions on properties ‒ E.g. Postgraduate students attend only Postgraduate courses ‒ E.g. Faculty members must teach at least one Undergraduate course • Characteristics of properties ‒ Object properties vs. Datatype properties ‒ Symmetrical, transitive, functional properties N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 34 Semantic Web Ontology Languages OWL (Web Ontology Language)
  • 35. H2020 – Grant Agreement no. 691025 • RDF statement: ex:index.html dc:creator exstaff:85740 35 RDF Schema example ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range rdf:type N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  • 36. H2020 – Grant Agreement no. 691025 ex:WebPage rdf:type rdfs:Class ex:Employee rdf:type rdfs:Class ex:Person rdf:type rdfs:Class ex:Employee rdf:subClassOf ex:Person dc:creator rdf:type rdf:Property dc:creator rdfs:domain ex:WebPage dc:creator rdfs:range ex:Person 36 RDF Schema definitions • Resource ex:WebPage is a class ‒ An RDFS class, there also OWL classes • Resource ex:Employee is a class • Class ex:Employee is a subclass of class ex:Person • Resource dc:creator is a property (instance of rdf:Property class) • Property dc:creator is attached to class ex:WebPage and takes values instances of class ex:Person N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  • 37. H2020 – Grant Agreement no. 691025 • Resource ex:index.html is an instance of class ex:WebPage ex:index.html rdf:type ex: WebPage • Resource exstaff:85740 is an instance of class ex:Employee exstaff:85740 rdf:type ex:Employee 37 Connecting RDF instances to classes N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  • 38. H2020 – Grant Agreement no. 691025 • Reasoning in RDF Schema is based on entailment rules • Rules are logical implications: IF a condition is true THEN the conclusion is also true • Entailment rules: IF such and such triples exist THEN add also these triples (conclusions made true) • E.g. rule for the subClassOf property: IF ?x rdf:type ?u . AND ?u rdfs:subClassOf ?v . THEN ?x rdf:type ?v . N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 38 Reasoning in RDF Schema
  • 39. H2020 – Grant Agreement no. 691025 RDF Schema Inference / Query Process Original (explicit) set of triples RDF document translation RDF/S Inference rules Inferred (implicit) set of triples Set of all triples Query N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 39
  • 40. H2020 – Grant Agreement no. 691025 Reasoning Example in RDF Schema Due to subClassOf ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range rdf:type N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 40
  • 41. H2020 – Grant Agreement no. 691025 Reasoning Example in RDF Schema Due to property domain ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 41
  • 42. H2020 – Grant Agreement no. 691025 Reasoning Example in RDF Schema Due to property range ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range rdf:type N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 42
  • 43. H2020 – Grant Agreement no. 691025 Reasoning Example in RDF Schema Due to property range ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 43
  • 44. H2020 – Grant Agreement no. 691025 Reasoning Example in RDF Schema Due to property range ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 44
  • 45. H2020 – Grant Agreement no. 691025 Reasoning Example in RDF Schema Due to property range ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 45
  • 46. H2020 – Grant Agreement no. 691025 • OWL is mapped on a Description Logic ‒ A subset of Predicate Logic (aka First-Order Logic / FOL) ‒ Then Description Logic reasoners are used (FaCT++, RACER, Pellet, Hermit, etc.) • Why a subset of Predicate Logic? ‒ Reasoning in Predicate Logic is undecidable ‒ Reasoning in Description logics is (usually) decidable ‒ Efficient decision procedures have been designed and implemented N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 46 Reasoning in OWL
  • 47. H2020 – Grant Agreement no. 691025 • Equivalence of classes ‒ If class A is equivalent to class B, and class B is equivalent to class C, then A is equivalent to C, too ‒ If class A is subclass of B, B subclass of C and C subclass of A, then A, B, C are equivalent to each other • Class membership ‒ If x is an instance of a class C, and C is a subclass of D, then we can infer that x is an instance of D ‒ If C and D are equivalent classes, then if x is an instance C, then it is also an instance of D, and vice versa Reasoning Tasks in OWL N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 47
  • 48. H2020 – Grant Agreement no. 691025 • Consistency ‒ X instance of both classes A and B, but A and B are disjoint ‒ X is an instance of both A and complement of A ‒ This is an indication of an error in the ontology • Instance Classification ‒ Certain property-value pairs are a sufficient condition for membership in a class A →FirstYearCourses are Courses with Year=1 ‒ If an individual x satisfies such conditions, we can conclude that x must be an instance of A Reasoning Tasks in OWL N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 48
  • 49. H2020 – Grant Agreement no. 691025 • Class Classification ‒ A driver is a person that drives a vehicle. ‒ A bus driver is a person that drives a bus. ‒ A bus is a vehicle. ‒ A bus driver drives a vehicle, … ‒ … so he/she must be a driver. • Instance equality ‒ Every person has a unique mother ‒ X has two mothers A and B ‒ Thus, (in order to restore consistency) A = B. ‒ If we already know that A  B, then we have an inconsistency. Reasoning Tasks in OWL N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 49
  • 50. H2020 – Grant Agreement no. 691025 • The more expressive a logic is, the more computationally expensive it becomes to draw conclusions ‒ Drawing certain conclusions may become impossible if non-computability barriers are encountered • Compromise: ‒ A language supported by reasonably efficient reasoners ‒ A language that can express large classes of ontologies and knowledge Tradeoff between Expressive Power and Computational Complexity N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 50
  • 51. H2020 – Grant Agreement no. 691025 • OWL (OWL 2) comes with flavours / profiles of decreasing complexity, suitable for different types of applications ‒ OWL Full (full compatibility with RDF Schema) ‒ OWL DL (all constructs allowed – not combined with RDF Schema) ‒ OWL 2 EL: useful in applications where ontologies contain very large numbers of properties / classes, but not so many instances ‒ OWL 2 QL: aimed at applications that with very large volumes of instances; query answering is the most important reasoning task (data resides in relational databases) ‒ OWL 2 RL: aimed at applications that require scalable reasoning without sacrificing too much expressive power (data resides in triplestores) OWL flavours / profiles N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 51
  • 52. H2020 – Grant Agreement no. 691025 Storing and Querying RDF data with SPARQL … and an introduction to DBpedia 52N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  • 53. H2020 – Grant Agreement no. 691025 • RDF data (triples) are stored in special NoSQL databases, called Triple stores or Graph stores. ‒ Databases optimized to import, store, and query a huge number of triples. ‒ Sesame (70 million), OpenLink Virtuoso (>15.4 billion), GraphDB (>12 b), Apache Jena TDB (200 m), AllegroGraph (1 trillion), IBM DB2, Oracle, … • Triple stores are queried using SPARQL ‒ Sending SPARQL queries using the SPARQL protocol ‒ Triple stores provide a (public) endpoint, where SPARQL queries can be submitted • Clients can send queries to an endpoint using the HTTP protocol. ‒ You can issue a SPARQL query to an endpoint by entering it into the browser ‒ It’s preferable to have a client designed specifically for SPARQL. N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 53 Triple Stores
  • 54. H2020 – Grant Agreement no. 691025 DBpedia SPARQL endpoint (http://dbpedia.org/sparql) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 54
  • 55. H2020 – Grant Agreement no. 691025 • A project aiming to extract structured content from the information created as part of the Wikipedia. ‒ This structured information is then made available on the Web • DBpedia allows users to query relationships and properties associated with Wikipedia resources, including links to other related datasets. • One of the more famous parts of the Linked Data project, according to TBL DBpedia (http://dbpedia.org) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 55
  • 56. H2020 – Grant Agreement no. 691025 • Wikipedia articles include structured information embedded in the articles (mostly free text) ‒ E.g. "infobox" tables, categorization information, images, geo-coordinates and links to external pages. ‒ This structured information is extracted and put in a uniform dataset which can be queried. • The DBpedia project uses RDF to represent the extracted information. ‒ 8.8 billion RDF triples →1.1 billion from the English edition →4.4 billion from other language editions (125) →3.2 billion from DBpedia Commons and Wikidata DBpedia dataset N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 56
  • 57. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 57 Wikipedia extraction to DBpedia Infobox Title
  • 58. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 58 Wikipedia extraction to DBpedia Infobox Categories
  • 59. H2020 – Grant Agreement no. 691025 • The same concepts can be expressed using different properties in templates ‒ E.g. birthplace and placeofbirth ‒ Queries about where people were born must search for both properties to get complete results. • The DBpedia Mapping Language helps mapping properties to an ontology ‒ Reduces the number of synonyms • The development of the ontology and the mappings are open to public ‒ Due to large diversity of infoboxes and properties DBpedia challenges N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 59
  • 60. H2020 – Grant Agreement no. 691025 SPARQL examples • Find all cities in Cyprus with a University select DISTINCT ?c where { ?u rdf:type dbo:University . ?u dbo:country dbr:Cyprus . ?u dbo:city ?c . } ORDER BY ?c N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 60
  • 61. H2020 – Grant Agreement no. 691025 SPARQL examples • Find cities in Cyprus without Universities select ?c where { ?c rdf:type dbo:City . ?c dbo:country dbr:Cyprus . FILTER NOT EXISTS { ?u rdf:type dbo:University . ?u dbo:city ?c . } } Empty result! N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 61
  • 62. H2020 – Grant Agreement no. 691025 • Find all cities in Cyprus select ?c where { ?c rdf:type dbo:City . ?c dbo:country dbr:Cyprus . FILTER NOT EXISTS { ?u rdf:type dbo:University . ?u dbo:city ?c . } } N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 62 SPARQL examples What happened to the rest of the cities?
  • 63. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 63 DBpedia entry for Limassol There is no dbr:Limassol rdf:type dbo:City . triple in DBpedia
  • 64. H2020 – Grant Agreement no. 691025 DBpedia: the Truth! • DBpedia has a lot of wrong or missing information • Not all Wikipedia properties (infobox) have been correctly mapped to the corresponding DBpedia ontology property • Thus, in order to retrieve the correct information sometimes you have to become a detective! N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 64
  • 65. H2020 – Grant Agreement no. 691025 Query that returns more results select DISTINCT ?c where { { ?c rdf:type dbo:City . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . } order by ?c ?c is a city, or ?c is the city of something ?x N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 65
  • 66. H2020 – Grant Agreement no. 691025 Filtering results select DISTINCT ?c where { { ?c rdf:type dbo:City . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . ?c dbo:populationTotal ?p . FILTER ( ?p >= 50000 ) } order by ?c N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 66
  • 67. H2020 – Grant Agreement no. 691025 Return cities and their mayors select DISTINCT ?c ?n where { { ?c rdf:type dbo:City . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . ?c dbo:leaderName ?n . } order by ?c N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 67 What happened to the rest of the cities? They have not their mayor mentioned in DBpedia
  • 68. H2020 – Grant Agreement no. 691025 Return cities and their mayors (if mentioned) select DISTINCT ?c ?n where { { ?c rdf:type dbo:City . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . OPTIONAL { ?c dbo:leaderName ?n . } } order by ?c N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 68 The OPTIONAL keyword allows for flexibility in pattern matching • Needed for “null values”
  • 69. H2020 – Grant Agreement no. 691025 What about smaller towns? select DISTINCT ?c where { { ?c rdf:type dbo:Settlement . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . ?c dbo:populationTotal ?p . FILTER ( ?p > 1000 ) } order by ?c N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 69 dbo:Settlement is appropriate for villages, towns, even neighborhoods Too many results!
  • 70. H2020 – Grant Agreement no. 691025 Just count them! select count(DISTINCT ?c) as ?Num where { { ?c rdf:type dbo:Settlement . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . ?c dbo:populationTotal ?p . FILTER ( ?p > 1000 ) } order by ?c N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 70 The results contain also cities, towns, …, from the pseudo- state of “North Cyprus”
  • 71. H2020 – Grant Agreement no. 691025 Exclude results that are (directly) part of select count(DISTINCT ?c) as ?Num where { { ?c rdf:type dbo:Settlement . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . FILTER NOT EXISTS { ?c dbo:isPartOf dbr:Northern_Cyprus . } ?c dbo:populationTotal ?p . FILTER ( ?p > 1000 ) } N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 71 The results still contain places that are indirectly related to the pseudo-state of “Northern Cyprus” (e.g. dbr:Yeni_Jami,_Nicosia)
  • 72. H2020 – Grant Agreement no. 691025 Exclude results that are directly or indirectly part of select count(DISTINCT ?c) as ?Num where { { ?c rdf:type dbo:Settlement . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . FILTER NOT EXISTS { ?c dbo:isPartOf+ dbr:Northern_Cyprus . } ?c dbo:populationTotal ?p . FILTER ( ?p > 1000 ) } N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 72 The + sign indicates a path of 1 or more dbo:isPartOf edges in the RDF graph dbo:isPartOf dbo:isPartOf/dbo:isPartOf dbo:isPartOf/dbo:isPartOf/dbo:isPartOf …
  • 73. H2020 – Grant Agreement no. 691025 • dbr:Nicosia expands to full URI: http://dbpedia.org/resource/Nicosia ‒ Unique ID that represents the resource in the Web of Data • However, if you type the above URI at a browser, you will be re-directed at: http://dbpedia.org/page/Nicosia ‒ Manifestation (or visualization) of the resource in the Web of Documents • If you type at the browser: http://dbpedia.org/data/Nicosia ‒ The browser will retrieve an RDF/XML file that contains all information for Nicosia ‒ All triples with http://dbpedia.org/resource/Nicosia as a subject N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 73 BTW: LOD principles in action
  • 74. H2020 – Grant Agreement no. 691025 Ontologies, Logic & Rules Broadening reasoning capabilities 74N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  • 75. H2020 – Grant Agreement no. 691025 • Can ontologies express every piece of knowledge needed in the Semantic Web? • They can express static information (e.g. knowledge about the world) • They have some reasoning capabilities, but… • They cannot combine information from several different individuals in order to come to a complex conclusion about the world • They cannot reason about which actions should be performed at each situation Ontology shortcomings N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 75
  • 76. H2020 – Grant Agreement no. 691025 • Consists of logical implications (rules): A1, . . ., An  B ‒ Ai and B are atomic formulas • There are 2 ways of reading such a rule: ‒ Deductive rules: If A1,..., An are known to be true, then B is also true ‒ Reactive rules: If the conditions A1,..., An are true, then carry out the action B • Horn logic is tractable and is supported by efficient reasoning tools Horn Logic: a(nother) Predicate Logic subset N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 76
  • 77. H2020 – Grant Agreement no. 691025 • Neither of them is a subset of the other ‒ Both are needed in the Semantic Web • Horn logic example: ‒ Persons who study and live in the same city are “home students” studies(X,U), lives(X,A), loc(U,C), loc(A,C)  homeStudent(X) ‒ It is impossible to state that in OWL • OWL example: ‒ A person is either a man or a woman ‒ Easily expressed in OWL using disjoint union ‒ It is impossible to state that in Horn logic Horn Logic vs. Description Logics (aka ontologies) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 77
  • 78. H2020 – Grant Agreement no. 691025 • Horn logic (rules) and description logics (ontologies) are orthogonal ‒ Both are subsets of first-order logic (FOL) ‒ Neither is the subset of each other Horn logic vs. Description logics FOL Horn LogicDescription logics OWL 2 RL SWRL   N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 78
  • 79. H2020 – Grant Agreement no. 691025 • The simplest integration approach is the intersection of both logics • OWL 2 RL is an interesting sublanguage of OWL 2 DL ‒ Inherits open-world assumption and non-unique-name assumption ‒ These assumptions do not make a difference OWL2 RL N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 79
  • 80. H2020 – Grant Agreement no. 691025 • OWA: We cannot conclude some statement x to be false simply because we cannot show x to be true. ‒ E.g. a patient’s clinical history does not include a particular allergy ‒ It would be incorrect to assume that the patient does not suffer from that allergy ‒ It is unknown, unless more information is given • CWA: If something cannot be proved, then it is false ‒ E.g. we are looking for a direct flight between Larnaca and Madrid in a DB application for airline reservations ‒ The flight doesn’t exist in the database ‒ Expected / correct answer: “There is no direct flight between Austin and Madrid.” • OWL is committed to OWA, Horn logic to CWA Open-World Assumption (OWA) vs. Closed-World Assumption (CWA) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 80
  • 81. H2020 – Grant Agreement no. 691025 • Statement: “Juan is a citizen of the USA.” • Question: “Is Juan a citizen of Colombia?” ‒ CWA answer: no ‒ OWA answer: I don’t know • Additional statements: ‒ “A person can only be citizen of one country” ‒ “Juan is a citizen of Colombia.” • CWA: error (we assume that Colombia and USA are different things) • OWA: “USA and Colombia must be the same thing” OWA vs. CWA Examples N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 81
  • 82. H2020 – Grant Agreement no. 691025 • When two individuals are known by different names, they are in fact different individuals. ‒ Sometimes works well and sometimes not ‒ Example in favor: when two products in a catalog are known by different codes, they are different ‒ Example against: two people in a social environment initially known with different identifiers (e.g., “Prof. van Harmelen” and “Frank”) are sometimes the same person • CWA systems (Horn logic) have UNA • OWA systems (OWL) do not have UNA ‒ However, one could manually add the UNA ‒ Using owl:allDifferent or owl:differentFrom Unique-Name Αssumption (UNA) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 82
  • 83. H2020 – Grant Agreement no. 691025 • OWL 2 RL is the largest fragment of OWL on which the choice for CWA/OWA and UNA does not matter ‒ Weak enough so that differences between choices don’t show up. ‒ Still large enough to enable useful representation and reasoning tasks. • Constructs of OWL that can be expressed using Horn logic rules ‒ Subclass, sub-property, class and property equivalence ‒ Equality-inequality between individuals ‒ Inverse, transitive, symmetric and functional properties ‒ Intersection of classes • Excluded constructors ‒ Union, existential quantification, and arbitrary cardinality constraints OWL 2 RL N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 83
  • 84. H2020 – Grant Agreement no. 691025 • A triple s p o is expressed as a fact p(s, o) • An instance declaration rdf:type(a,C) ‒ a is an instance of class C ‒ expressed as C(a) • C is a subclass of D: C(X) → D(X) • P is a sub-property of Q: P(X,Y) → Q(X,Y) • Domain and Range Restrictions ‒ D is the domain of property P: P(X, Y) → D(X) ‒ R is the range of property P: P(X, Y) → R(Y) RDF constructs N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 84
  • 85. H2020 – Grant Agreement no. 691025 • equivalentClass(C,D): C(X) → D(X), D(X) → C(X) • equivalentProperty(P,Q): P(X,Y) → Q(X,Y), Q(X,Y) → P(X,Y) • Transitive Properties: P(X,Y), P(Y,Z) → P(X,Z) • allValuesFrom(P,D): C(X), P(X,Y) → D(Y) ‒ Necessary restriction (e.g. all undergraduate students must attend only undergraduate courses) ‒ UGStudent(X), attends(X,Y) → UGCourse(Y) • someValuesFrom(P,D): P(X,Y), D(Y) → C(X) ‒ Sufficient restriction / Instance classification rule (e.g. if someone attends a postgraduate course, then he/she is a postgraduate student) ‒ attends(X,Y), PGCourse(Y) → PGStudent(X) OWL Constructs N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 85
  • 86. H2020 – Grant Agreement no. 691025 • A proposed Semantic Web language combining OWL 2 DL with Horn logic ‒ Syntax: Datalog RuleML • Allows the definition of Horn-logic rules on top of OWL 2 DL ontologies ‒ Rule conclusions are “stored” back in the ontology • SWRL unites the expressivities of DL and Horn-logic ‒ OWL 2 RL combines the advantages of both languages in their common sublanguage • SWRL is intractable ‒ DL-safe rules: tractable subset of SWRL →Every variable must appear in a non-DL atom in the rule body Semantic Web Rules Language (SWRL) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 86
  • 87. H2020 – Grant Agreement no. 691025 Man(?m) → Person(?m) ‒ Possible in OWL - subclassOf relation ‒ Some rules are OWL syntactic sugar Person(?m)  hasSex(?m,male) → Man(?m) ‒ Possible in OWL – hasValue (sufficient) restriction ‒ Not all such reclassifications are possible in OWL Person(?m)  hasSpouse(?m,?w)  works_at(?w,?j)  publicOrg(?j) → MarriedToPublicServantPerson(?m) ‒ Not possible in OWL Example SWRL Rules: Reclassification N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 87
  • 88. H2020 – Grant Agreement no. 691025 hasParent(?x, ?y)  hasBrother(?y, ?z) → hasUncle(?x, ?z) ‒ Property chaining ‒ Possible in OWL 2 - Not possible in OWL 1 Person(?p)  hasSibling(?p,?s)  Man(?s) → hasBrother(?p,?s) ‒ Not possible in OWL Publication(?p)  hasAuthor(?p,?y)  hasAuthor(?p,?z)  differentFrom(?y,?z) → cooperatedWith(?y, ?z) ‒ SWRL does not adopt the UNA ‒ Individuals must also be explicitly stated to be different (using owl:allDifferent restriction) in the OWL ontology Example SWRL Rules: Property Value Assignment N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 88
  • 89. H2020 – Grant Agreement no. 691025 • Built-ins dramatically increase expressivity ‒ Most rules are not expressible in OWL 1 ‒ Some built-ins can be expressed in OWL 2 Person(?p)  hasAge(?p,?age)  swrlb:greaterThan(?age,17) → Adult(?p) Person(?p)  hasNumber(?p, ?number)  swrlb:startsWith(?number, "+") → hasInternationalNumber(?p, true) Person(?p)  hasSalaryInPounds(?p, ?gbp)  swrlb:multiply(?d, ?gbp, 2) → hasSalaryInDollars(?p, ?dollars) Example SWRL Rules: Built-ins N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 89
  • 90. H2020 – Grant Agreement no. 691025 Person(?p)  not hasCar(?p, ?c) → CarlessPerson(?p) • Not possible – rule language does not support negation • Potential invalidation - what if a person later gets a car? N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 5-90 SWRL is Monotonic: does not Support Negation
  • 91. H2020 – Grant Agreement no. 691025 • De-facto industry standard to represent SPARQL rules and constraints on Semantic Web models. • Provides meta-modeling capabilities that allow users to define their own SPARQL functions and query templates. • Includes a ready to use library of common functions. • SPIN follows the CWA ‒ A special kind on negation exists (negation-as-failure) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 91 SPIN (SPARQL Inferencing Notation)
  • 92. H2020 – Grant Agreement no. 691025 • Rules can be expressed in SPARQL using the CONSTRUCT feature parent(X,Y), parent(Y,Z)  grandparent(X,Z) CONSTRUCT { ?X grandParent ?Z . } WHERE { ?X rdf:type Person . ?X parent ?Y . ?Y parent ?Z . } Rules in SPARQL: SPIN optional N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 92
  • 93. H2020 – Grant Agreement no. 691025 • Rules can be associated to classes ‒ Rules can represent behavior of the instances of that class (an OOP feature!) ‒ Even constructors can be define! ‒ Global rules can also be defined • SPIN variants ‒ Inference / Entailment Rules: CONSTRUCT, INSERT ‒ Production Rules: DELETE ‒ Integrity constraints: ASK SPIN N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 93
  • 94. H2020 – Grant Agreement no. 691025 • The grandParent rule is stored at the Person class ‒ It will be executed only for instances of this class (and subclasses) ‒ Increases accuracy and efficiency CONSTRUCT { ?this grandParent ?Z . } WHERE { ?this parent ?Y . ?Y parent ?Z . } Rules in SPARQL: SPIN ?this means an instance of the class, where the rule is stored N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 94
  • 95. H2020 – Grant Agreement no. 691025 • Constraint are expressed via the ASK SPARQL construct • For each instance that the ASK query is true, then the constraint is violated ‒ A constraint violation warning is issued ASK WHERE { ?this hasAge ?age . FILTER (?age < 18) . } SPIN Constraints The SPIN constraint is stored at class Student N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 95
  • 96. H2020 – Grant Agreement no. 691025 • A more radical solution to constraint violation would be to delete the instance that violates the constraint ‒ This can be done via the DELETE construct ‒ The SPIN rules should be declared as a constructor ‒ Constructor rules run each time a new instance is created DELETE { ?this rdf:type Student . } WHERE { ?this hasAge ?age . FILTER (?age < 18) . } SPIN Constructors The SPIN constructor is stored at class Student N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 96
  • 97. H2020 – Grant Agreement no. 691025 • Unlike SWRL, in SPIN there is a special negation (negation-as-failure) ‒ When something is not found to be true, it is false, thus its negation is true Person(?p)  not hasCar(?p, ?c) → CarlessPerson(?p) CONSTRUCT { ?this rdf:type CarlessPerson . } WHERE { ?this rdf:type Person . FILTER NOT EXISTS { ?this hasCar ?x . } } Using negation in SPIN If we store the SPIN rule at class Person we do not need this line N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 97
  • 98. H2020 – Grant Agreement no. 691025 C1(X), equivalentClass(C1,C2)  C2(X) CONSTRUCT { ?X a ?C2 . } WHERE { ?X a ?C1. ?C1 equivalentClass ?C2. } OWL 2 RL rules in SPIN Actually, in some Semantic Web systems, the OWL 2 RL semantics are implemented via SPIN rules (TopBraid Composer) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 98
  • 99. H2020 – Grant Agreement no. 691025 A Semantic Entity Linking Use Case The URank system 99N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 URank Entity Extractor Site-specific transformations Extraction Rules Extracted Data Ranking sites Entity Linker Ranking datasets Merged dataset Entity Merger Domain- specific filtering Ranking ontology
  • 100. H2020 – Grant Agreement no. 691025 • University rankings ‒ Means of advertisement ‒ There are so many of them! (>20 global rankings) ‒ Do we need all of them? Are they similar? Are they robust? ‒ Comparative Statistical Analysis is needed • Collecting the data from multiple web sites ‒ Web data extraction ‒ Each ranking site produces a ranking table ‒ A single table needs to be constructed to feed the Statisticians ‒ The ranking tables need to be merged into a single all-rankings table N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 100 Motivating example
  • 101. H2020 – Grant Agreement no. 691025 Name Rank … … … … Aristotle University of Thessaloniki 491-500 … … … N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 101 The easy case… QS Name Rank … … … … Aristotle University of Thessaloniki 401-500 … … … THE … Name QS THE … … … … … Aristotle University of Thessaloniki 491-500 401-500 … … … … … Merged table
  • 102. H2020 – Grant Agreement no. 691025 Name Rank … … … … The Imperial College of Science, Technology and Medicine 22 … … … N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 102 The difficult case… ARWU Name Rank … … … … Imperial College London 8 … … … THE … Name ARWU THE … … … … … The Imperial College of Science, Technology and Medicine 22 - … Imperial College London - 8 … … … … … Merged table Levenshtein Distance = 38 Substring similarity* = 0.605 *A string metric for ontology alignment by Giorgos Stoilos, 2005.
  • 103. H2020 – Grant Agreement no. 691025 • Use string matching and semantic search to find a unique ID for each university in each ranking table • Where to search? DBpedia ‒ DBpedia / Wikipedia have entries for almost all World Universities ‒ Each DBpedia entity has an ID • Hopefully different variations of the University name will retrieve the same University entity in DBpedia • Then merging is straight forward N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 103 Merging tables: Solution
  • 104. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 104 URank[1] System URank Entity Extractor Site-specific transformations Extraction Rules Extracted Data Ranking sites Entity Linker Ranking datasets Merged dataset Entity Merger Domain- specific filtering Ranking ontology [1] N. Bassiliades: “Collecting University Rankings for Comparison Using Web Extraction and Entity Linking Techniques”, Springer CCIS Vol.469, pp.23-46, 2014. A Prolog application that: 1. Extracts data from ranking sites 2. Links Universities to DBpedia entities through semantic search and string similarity 3. Generates merged table
  • 105. H2020 – Grant Agreement no. 691025 • Entity linking: the task of determining the identity of entities mentioned in text • Different from Named Entity Recognition, which identifies the occurrence or mention of a named entity in text but it does not identify which specific entity it is. • Entity linking requires a knowledge base containing the entities to which entity mentions can be linked. • In URank: ‒ Named Entity Recognition is not needed - All University names are entities ‒ Entity Linking uses DBpedia as a knowledge base N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 105 Entity Linking
  • 106. H2020 – Grant Agreement no. 691025 • DBpedia University entities do not always belong to the correct class ‒ dbo:University or dbo:EducationalInstitution ‒ Need to loose that criterion carefully • University mergers /splits ‒ University of Paris split in 1970 into 13 Universities named with very similar names: University of Paris I, II, … ‒ University of Montpellier split into three universities (I, II, II) in 1970; I and II merged back in 2015 ‒ Need to check if the University is currently operating • Newcastle University, UK vs University of Newcastle, Australia ‒ Need to check the country N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 106 Entity Linking challenges
  • 107. H2020 – Grant Agreement no. 691025 • US vs USA ‒ Need to have a common representation for countries • Imperial College London vs Imperial College of Science, Technology and Medicine ‒ Need to have access to alternative University names • University of Montpellier II vs University of Montpellier 2 ‒ Need to convert between Arabic and roman literals • Universität vs University ‒ Need to translate between different languages N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 107 Entity Linking challenges
  • 108. H2020 – Grant Agreement no. 691025 • DBpedia Spotlight: annotate mentions of DBpedia resources in NL text ‒ http://demo.dbpedia-spotlight.org/ ‒ Have tried it for the University Ranking use case with ~86% F-measure • Silk: integrate heterogeneous data sources ‒ http://silkframework.org/ ‒ Generates links between related data items within different Linked Data sources. ‒ Linked Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web. ‒ Experimentation for the use case has even lower F-measure • Domain-specific knowledge must be used to face all challenges N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 108 General-purpose Entity Linking Tools
  • 109. H2020 – Grant Agreement no. 691025 • Each University entity at each ranking site is matched against a DBpedia entry using 3 alternative methods ‒ DBpedia lookup service ‒ DBpedia SPARQL endpoint, using approximate string matching filtering functions ‒ Keyword search engine of Wikipedia • At each step, if a satisfactory match is found (substring matching), the algorithm terminates • Otherwise, all matching entries are collected and scored ‒ Top scored candidate is returned as a match N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 109 Entity Linking in URank
  • 110. H2020 – Grant Agreement no. 691025 • String distance is measured using a metric for ontology alignment [1] ‒ Concerns substring matching ‒ More appropriate for matching names of Universities, than e.g. Levenshtein ‒ E.g. “Imperial College” vs. “Imperial College of Science, Technology and Medicine” ‒ Materialized as a built-in predicate in SWI-Prolog (isub/4) • Satisfactory match is above a high similarity threshold ‒  0,97 depending on the algorithm step [1] Stoilos, G., Stamou, G., Kollias, S.: A String Metric for Ontology Alignment. ISWC 2005, LNCS, vol. 3729, pp. 624-637. Springer (2005) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 110 String matching
  • 111. H2020 – Grant Agreement no. 691025 http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?Q ueryClass=University&QueryString=Imperial%20College%20Lo ndon • RESTful API parameters: ‒ QueryString: a string for which a DBpedia URI should be found (University name) ‒ QueryClass: a DBpedia class from the Ontology that the results should have (University) ‒ MaxHits: the maximum number of returned results (default: 5) • Results in XML should be parsed ‒ using native SWI-Prolog XPath built-ins 111 DBpedia lookup service N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  • 112. H2020 – Grant Agreement no. 691025 • If DBpedia lookup does not return a satisfactory match then the DBpedia SPARQL endpoint is used (provided by OpenLink Virtuoso RDF DB engine) SELECT ?univ, ?name WHERE { ?univ rdf:type Class . ?univ ?property ?val . ?val bif:contains University.Name.Words option (score ?score) . ?univ rdfs:label ?name . FILTER (lang(?name) = "en") } ORDER BY DESC (?score*0.3+sql: rnk_scale(<LONG::IRI_RANK> (?univ))) LIMIT Top-N2 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 112 DBpedia SPARQL endpoint (http://dbpedia.org/sparql)
  • 113. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 113 SPARQL query example @ DBpedia Results are actually returned in RDF, not HTML
  • 114. H2020 – Grant Agreement no. 691025 • If neither native “DBpedia methods” return a satisfactory match, the Wikipedia search engine is used http://en.wikipedia.org/w/index.php?search=Univ.name&limit=Top-N3&go=Go • Results are verified using equivalent DBpedia entity ‒ Wikipedia URLs are uniquely transformed to DBpedia URIs ‒ https://en.wikipedia.org/wiki/Imperial_College_London  http://dbpedia.org/resource/Imperial_College_London N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 114 Keyword search engine of Wikipedia
  • 115. H2020 – Grant Agreement no. 691025 • Spatiotemporal constraints are used for filtering ‒ Not easy, due to incomplete and inaccurate information from DBpedia/Wikipedia • Retrieved university must be located in the same country with the extracted university ‒ Using dbo:country property (not present in all entities) ‒ Using alternative properties dbo:state, dbo:city, dbo:location, and try to locate country (geographical areas containment) ‒ Using redirection to another entity (owl:sameAs, dbo:wikiPageRedirects) • Countries are not always represented using the same name ‒ E.g. US, U.S., USA, U.S.A., United States of America ‒ E.g. UK, United Kingdom, United Kingdom of Great Britain and Northern Ireland ‒ Synonym matrix for most problematic cases N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 115 Spatiotemporal constraints
  • 116. H2020 – Grant Agreement no. 691025 • select ?x where { <DBPediaURL> dbo:country ?x . } • select ?x where { <DBPediaURL> dbo:state ?x . } • select ?x where { <DBPediaURL> dbo:city ?x . } • select ?x where { <DBPediaURL> dbo:location ?x . } • … N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 116 Geographical area containment
  • 117. H2020 – Grant Agreement no. 691025 • Retrieved university must be still operating ‒ E.g. University of Paris split, University of Montpellier split and merge, etc. • Checking if the property dbp:closed exists ‒ Not all “closed” Universities have this property ‒ In this case heuristic scrapping from Wikipedia entries is performed N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 117 Spatiotemporal constraints
  • 118. H2020 – Grant Agreement no. 691025 Conclusions 118N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  • 119. H2020 – Grant Agreement no. 691025 • Semantic technologies and techniques – not necessarily using Semantic Web standards – have been adopted for building and using knowledge graphs ‒ Google, Facebook, Yandex, Baidu, Bing have embedded Semantic technologies into their core businesses ‒ The presentage of Web content that uses microdata/RDFa/schema.org information to enhance search results reached double digits ‒ IBM’s Watson (AlchemyLanguage) returns concept information as DBpedia/yago/freemix URIs, including a Linked Data API • Semantic technologies are rapidly finding their ways into consumer products ‒ The average developer barely realizes it, but semantics are now just about everywhere N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 119 The Future of Semantic Technologies
  • 120. H2020 – Grant Agreement no. 691025 • Semantic Web is a branch of AI • The market interest and justification is in solving problems ‒ AI has problem solving in its core • Semantic technologies will become an essential enabler to the development of true knowledge-based AI ‒ What things mean and how people understand what they mean • AI applications currently focus on Machine Learning (ML) and NLP ‒ Knowledge graphs can help enhance the accuracy of ML/NLP ‒ NLP: top-down processing (use ontology terms to disambiguate text) ‒ ML: bottom-up processing (learn from data and match to generalized concepts) 120 Semantic Web and Artificial Intelligence N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  • 121. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 121 Enhancing seCurity and privAcy in the Social wEb: a user-centered approach for the protection of minors