SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
Query modeling and information retrieval within
the Web of Data
Cristian LAI
clai@crs4.it

CRS4

september 6, 2012

1 / 37
Outline

G

Motivation

G

UnStructured Data

G

Structured Data

G

Query building

G

Applications

G

Conclusion

september 6, 2012

2 / 37
Context
Semantic Web

http://www.w3.org/2006/Talks/1023-sb-W3CTechSemWeb/
september 6, 2012

3 / 37
Motivation
Search on the Web

http://www.slideshare.net/novaspivack/web-evolution-nova-spivack-twine
september 6, 2012

4 / 37
Outline

G

Motivation

G

UnStructured Data

G

Structured Data

G

Query building

G

Applications

G

Conclusion

september 6, 2012

5 / 37
Wikipedia

G
G

G

G
G

Started in 2001.
Is a multilingual, web-based, free-content encyclopedia project based on
an openly editable model.
Is the 5th site on the web and serves 454 million unique visitors monthly as
of March 2011.
Has fewer than 100 employees.
Wikipedia holds an annual fundraiser instead of accepting advertising. You
may have seen "A personal appeal from Wikipedia founder Jimmy Wales" if
you’ve used the online encyclopedia during the last weeks of 2011. Google
co-founder Sergey Brin and his wife, Anne Wojcicki, has given a 500,000
dollars grant to help Wikipedia fund its 28.3 million dollars annual budget.

september 6, 2012

6 / 37
Wikipedia

G

Pros:
H
H

H

G

Is a highly-efficient not-for-profit organization.
Is the finest example of truly collaborative created content: >19M articles;
>270 languages, >82k active contributors.
Covers many topics and domains, articles are a result of a community
consensus.

Cons:
H

Contains many inconsistencies.
G

H
H

Disclaimer: Wikipedia cannot guarantee the validity of the information found here.

Is not very well integrated with other data sources.
Queries and search are not facilitated due to the lacks of structured
representation.

september 6, 2012

7 / 37
Issues

G
G

UnStructured data, keywords based search.
Simple questions are hard to answer.
H
H
H

G
G

People who were born in Rome before 1900.
Italian musicians with English and French descriptions.
The official websites of companies with more than 500 employees.

The information required to answer these is contained in Wikipedia.
Transforming Wikipedia into a knowledge base.
H
H

To reveal the structure and semantics of Wikipedia content
The DBpedia project.

september 6, 2012

8 / 37
Structure in Wikipedia
G

Wikipedia articles consist mostly of free text, but also contain different
types of structured information, such as infobox templates,categorisation
information, images, geo-coordinates, and links to external Web pages.

G

Title

G

Abstract

G

Infobox Template

G

Geo-coordinates

G

Caegories

G
G

Images
Links
H
H
H
H

other language version
other Wikipedia pages
redirects
disambiguation
september 6, 2012

9 / 37
Structured Information in Wikipedia

september 6, 2012

10 / 37
Structured Information in Wikipedia

september 6, 2012

11 / 37
Structured Information in Wikipedia

september 6, 2012

12 / 37
Outline

G

Motivation

G

UnStructured Data

G

Structured Data

G

Query building

G

Applications

G

Conclusion

september 6, 2012

13 / 37
RDF representation
Knowledge Base

dbp:Cagliari rdf:type dbp:City
dbp:Cagliari dbp:Title "Cagliari"
dbp:Cagliari dbp:Country dbp:Italy
dbp:Cagliari dbp:postalCode 09100
dbp:Cagliari geo:lat "39.246387"xsd:float
dbp:Cagliari geo:long "9.057500"xsd:float
dbp:Cagliari rdf:type yago:MediterraneanPortCitiesAndTownsInItaly
...
G

An environment for collecting and structuring data.

G

Well defined structure of classification.

september 6, 2012

14 / 37
RDF

G
G

Triples: (subject, predicate, object)
Subject and object
H

are both URIs that each identify a resource, or a URI and a string literal
respectively.

H
G

Predicate
H

G

specifies how the subject and object are related, and is also represented by a
URI.

For example:
H
H
H

A knows B
C isAuthorOf D
Two resources linked in this fashion can be drawn from different data sets on
the Web, allowing data in one data source to be linked to that in another,
thereby creating a Web of Data.

september 6, 2012

15 / 37
DBpedia

G
G

G
G

Started in 2007.
Is the result of a community effort to extract structured information from
Wikipedia.
Makes Wikipedia data available as RDF.
Results: The DBpedia Data Set
H

H
H
G

G

describes 3.64 million "things" with over half a billion "facts" (July 2011), 364k
persons, 462k places, 99k music albums, 54k films, 148k organisations;
extraction in 97 different languages;
672M RDF triples

It is maintained by: Universität Leipzig, Freie Universität Berlin, OpenLink
Software, Inc.
See http://wiki.dbpedia.org/Team

september 6, 2012

16 / 37
Nucleus of the Web of Data

G
G

Within the W3C Linking Open Data (LOD) community effort.
Tim Berners-Lee’s Linked Data principles.
H
H
H
H

G

G

URI
HTTP
RDF, SPARQL
Interlinking among data providers

An increasing number of data providers have started to publish and
interlink data on the Web.
Several billion RDF triples and covers domains such as geographic
information, people, companies, online communities, films, music, books
and scientific publications.

september 6, 2012

17 / 37
LOD Datasets

september 6, 2012

18 / 37
LOD Datasets

september 6, 2012

19 / 37
Outline

G

Motivation

G

UnStructured Data

G

Structured Data

G

Query building

G

Applications

G

Conclusion

september 6, 2012

20 / 37
SPARQL Query Language

G

G

G

G

RDF is a directed, labeled graph data format for representing information
(also in the Web).
SPARQL is a language for querying RDF graphs by specifying templates
against which to compare graph components. Data which matches or
satisfies a template is returned from the query.
A triple template contains variables that represent triplet components (e.g.,
?s, ?p, or ?o within a triplet).
Example:
H
H

H

?person ex:age "20"xsd:integer .
Identifies a list of triplet subjects that have an ex:age property of "20".
Analogous to asking "Who has age 20?".
The SPARQL query engine will return a list of the subject component of triples
that satisfy each query through value substitution.

september 6, 2012

21 / 37
SPARQL Queries
SELECT variables_list
FROM < RDF_source_URL >
WHERE {
{ triple_pattern_1 .
. . .
triple_pattern_n . }.
}
SELECT ?person

?person

FROM < http://ex.com >

------------------

WHERE {
?person ex:age "20"xsd:integer .

_p1
_p2
. . .

}

september 6, 2012

22 / 37
The DBpedia SPARQL endpoint

G

G

All data sets are available for queries via the DBpedia SPARQL endpoint
(http://dbpedia.org/sparql).
Querying the data set:
H
H
H
H
H

...
Abstracts of movies starring Tom Cruise, released before 1999.
The official websites of companies with more than 50000 employees.
Cities with more than 2 million habitants.
...

september 6, 2012

23 / 37
Abstracts of movies starring Tom Cruise, released before
1999
SPARQL

SELECT ?subject ?label ?released ?abstract WHERE {
?subject rdf:type <http://dbpedia.org/ontology/Film>.
?subject dbpedia2:starring <http://dbpedia.org/resource/Tom_Cruise>.
?subject rdfs:comment ?abstract.
?subject rdfs:label ?label.
FILTER(lang(?abstract) = "en" && lang(?label) = "en").
?subject <http://dbpedia.org/ontology/releaseDate> ?released.
FILTER(xsd:date(?released) < "2000-01-01"^^xsd:date).
} ORDER BY ?released

september 6, 2012

24 / 37
Outline

G

Motivation

G

UnStructured Data

G

Structured Data

G

Query building

G

Applications

G

Conclusion

september 6, 2012

25 / 37
Linked Data Search Engines and Indexes

G

A number of search engines have been developed that crawl Linked Data
from the Web by following RDF links, and provide query capabilities over
aggregated data.
Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web:
Theory and Technology, 1:1, 1-136. Morgan & Claypool.

G

G

Google, Bing and Yahoo! agree to create and support a common
vocabulary for structured data markup on web pages.
Facebook has started to support RDF and Linked Data URIs and now
provides access to parts of its user data via a Linked Data API.

september 6, 2012

26 / 37
Google rich snippets

september 6, 2012

27 / 37
Twitter, #annotations
Twitter API based client

september 6, 2012

28 / 37
Twitter, #annotations
Lookup annotations

september 6, 2012

29 / 37
Twitter, #annotations
Resource #dbpedia:Cagliari

september 6, 2012

30 / 37
Twitter, #annotations
Resource #dbpedia:Cagliari

september 6, 2012

31 / 37
Question answering
Risorsa Cagliari

september 6, 2012

32 / 37
Question answering
Template

september 6, 2012

33 / 37
Question answering
RDF/XML

september 6, 2012

34 / 37
Outline

G

Motivation

G

UnStructured Data

G

Structured Data

G

Query building

G

Applications

G

Conclusion

september 6, 2012

35 / 37
Conclusion

G

G

G
G

Data on the Web is a major challenge; technologies are needed to use
them, to interact with them, to integrate them.
Semantic Web technologies (RDF, SPARQL, etc.) can play a major role in
publishing and using Data on the Web.
Users can largely benefit from the wide world of structured content.
Content providers joining the Linking Open Data project are contributing
to create more meaningful navigation paths not only within websites but
across the whole web.

september 6, 2012

36 / 37
Q&A

september 6, 2012

37 / 37

Mais conteúdo relacionado

Mais procurados

Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org sopekmir
 
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...Semantic Web Company
 
Structured Data for the Financial Industry
Structured Data for the Financial Industry Structured Data for the Financial Industry
Structured Data for the Financial Industry sopekmir
 
One Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLOne Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLConnected Data World
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
 
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...Benjamin Adrian
 
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, GlasgowNotes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, GlasgowPeterWinstanley1
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesDATAVERSITY
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksRichard Cyganiak
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RGraphRM
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
 
A possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingA possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingsopekmir
 
RDAP 16 Poster: Hacking the figshare API to Create Enhanced Metadata Records
RDAP 16 Poster: Hacking the figshare API to Create Enhanced Metadata RecordsRDAP 16 Poster: Hacking the figshare API to Create Enhanced Metadata Records
RDAP 16 Poster: Hacking the figshare API to Create Enhanced Metadata RecordsASIS&T
 
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jExplicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jConnected Data World
 

Mais procurados (20)

Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
 
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
 
Structured Data for the Financial Industry
Structured Data for the Financial Industry Structured Data for the Financial Industry
Structured Data for the Financial Industry
 
One Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLOne Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACL
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data Stack
 
FIBO & Schema.org
FIBO & Schema.orgFIBO & Schema.org
FIBO & Schema.org
 
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...
 
Semantic Web vision and its relevance to Open Digital Data for MGI
Semantic Web vision and its relevance to Open Digital Data for MGISemantic Web vision and its relevance to Open Digital Data for MGI
Semantic Web vision and its relevance to Open Digital Data for MGI
 
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, GlasgowNotes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data Challenges
 
euBusinessGraph Company and Economic Data
euBusinessGraph Company and Economic DataeuBusinessGraph Company and Economic Data
euBusinessGraph Company and Economic Data
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and Gridworks
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk Analytics
 
A possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingA possible future role of schema.org for business reporting
A possible future role of schema.org for business reporting
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
RDAP 16 Poster: Hacking the figshare API to Create Enhanced Metadata Records
RDAP 16 Poster: Hacking the figshare API to Create Enhanced Metadata RecordsRDAP 16 Poster: Hacking the figshare API to Create Enhanced Metadata Records
RDAP 16 Poster: Hacking the figshare API to Create Enhanced Metadata Records
 
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jExplicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
 

Destaque

Phd_cristian_lai presentation
Phd_cristian_lai presentationPhd_cristian_lai presentation
Phd_cristian_lai presentationCristian Lai
 
Dart2013_presentation_cristian_lai
Dart2013_presentation_cristian_laiDart2013_presentation_cristian_lai
Dart2013_presentation_cristian_laiCristian Lai
 
Icwe2016 CRS4 Lugano
Icwe2016 CRS4 LuganoIcwe2016 CRS4 Lugano
Icwe2016 CRS4 LuganoCristian Lai
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 

Destaque (6)

Phd_cristian_lai presentation
Phd_cristian_lai presentationPhd_cristian_lai presentation
Phd_cristian_lai presentation
 
Dart2013_presentation_cristian_lai
Dart2013_presentation_cristian_laiDart2013_presentation_cristian_lai
Dart2013_presentation_cristian_lai
 
Icwe2016 CRS4 Lugano
Icwe2016 CRS4 LuganoIcwe2016 CRS4 Lugano
Icwe2016 CRS4 Lugano
 
La tortuga
La tortugaLa tortuga
La tortuga
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Semelhante a Query Modeling and Information Retrieval on the Semantic Web

How google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowHow google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowVasu Jain
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked datavafopoulos
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics DemoOntotext
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentOntotext
 
schema.org, Linked Data's Gateway Drug
schema.org, Linked Data's Gateway Drugschema.org, Linked Data's Gateway Drug
schema.org, Linked Data's Gateway DrugConnected Data World
 
schema.org: Linked Data's Gateway Drug
schema.org: Linked Data's Gateway Drugschema.org: Linked Data's Gateway Drug
schema.org: Linked Data's Gateway DrugAaron Bradley
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
Schema.org Update at ISWC2012
Schema.org Update at ISWC2012Schema.org Update at ISWC2012
Schema.org Update at ISWC2012Alex Shubin
 
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePointSemantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePointDIQA Projektmanagement GmbH
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic webTony Dobaj
 
First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedRensselaer Polytechnic Institute
 
Semantic Markup
Semantic Markup Semantic Markup
Semantic Markup R A Akerkar
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data introvafopoulos
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Ig Bittencourt
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data TutorialSören Auer
 

Semelhante a Query Modeling and Information Retrieval on the Semantic Web (20)

How google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowHow google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrow
 
KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
Semantic Web talk TEMPLATE
Semantic Web talk TEMPLATESemantic Web talk TEMPLATE
Semantic Web talk TEMPLATE
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics Demo
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news content
 
schema.org, Linked Data's Gateway Drug
schema.org, Linked Data's Gateway Drugschema.org, Linked Data's Gateway Drug
schema.org, Linked Data's Gateway Drug
 
schema.org: Linked Data's Gateway Drug
schema.org: Linked Data's Gateway Drugschema.org: Linked Data's Gateway Drug
schema.org: Linked Data's Gateway Drug
 
Linked Data
Linked DataLinked Data
Linked Data
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Schema.org Update at ISWC2012
Schema.org Update at ISWC2012Schema.org Update at ISWC2012
Schema.org Update at ISWC2012
 
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePointSemantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic web
 
First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and Used
 
Semantic Markup
Semantic Markup Semantic Markup
Semantic Markup
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
 
2013.05 - LDOW 2013 @ WWW 2013
2013.05 - LDOW 2013 @ WWW 20132013.05 - LDOW 2013 @ WWW 2013
2013.05 - LDOW 2013 @ WWW 2013
 
Bosch, Wackerow: Linked data on the web
Bosch, Wackerow: Linked data on the web Bosch, Wackerow: Linked data on the web
Bosch, Wackerow: Linked data on the web
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 

Último

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Último (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Query Modeling and Information Retrieval on the Semantic Web

  • 1. Query modeling and information retrieval within the Web of Data Cristian LAI clai@crs4.it CRS4 september 6, 2012 1 / 37
  • 2. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 2 / 37
  • 4. Motivation Search on the Web http://www.slideshare.net/novaspivack/web-evolution-nova-spivack-twine september 6, 2012 4 / 37
  • 5. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 5 / 37
  • 6. Wikipedia G G G G G Started in 2001. Is a multilingual, web-based, free-content encyclopedia project based on an openly editable model. Is the 5th site on the web and serves 454 million unique visitors monthly as of March 2011. Has fewer than 100 employees. Wikipedia holds an annual fundraiser instead of accepting advertising. You may have seen "A personal appeal from Wikipedia founder Jimmy Wales" if you’ve used the online encyclopedia during the last weeks of 2011. Google co-founder Sergey Brin and his wife, Anne Wojcicki, has given a 500,000 dollars grant to help Wikipedia fund its 28.3 million dollars annual budget. september 6, 2012 6 / 37
  • 7. Wikipedia G Pros: H H H G Is a highly-efficient not-for-profit organization. Is the finest example of truly collaborative created content: >19M articles; >270 languages, >82k active contributors. Covers many topics and domains, articles are a result of a community consensus. Cons: H Contains many inconsistencies. G H H Disclaimer: Wikipedia cannot guarantee the validity of the information found here. Is not very well integrated with other data sources. Queries and search are not facilitated due to the lacks of structured representation. september 6, 2012 7 / 37
  • 8. Issues G G UnStructured data, keywords based search. Simple questions are hard to answer. H H H G G People who were born in Rome before 1900. Italian musicians with English and French descriptions. The official websites of companies with more than 500 employees. The information required to answer these is contained in Wikipedia. Transforming Wikipedia into a knowledge base. H H To reveal the structure and semantics of Wikipedia content The DBpedia project. september 6, 2012 8 / 37
  • 9. Structure in Wikipedia G Wikipedia articles consist mostly of free text, but also contain different types of structured information, such as infobox templates,categorisation information, images, geo-coordinates, and links to external Web pages. G Title G Abstract G Infobox Template G Geo-coordinates G Caegories G G Images Links H H H H other language version other Wikipedia pages redirects disambiguation september 6, 2012 9 / 37
  • 10. Structured Information in Wikipedia september 6, 2012 10 / 37
  • 11. Structured Information in Wikipedia september 6, 2012 11 / 37
  • 12. Structured Information in Wikipedia september 6, 2012 12 / 37
  • 13. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 13 / 37
  • 14. RDF representation Knowledge Base dbp:Cagliari rdf:type dbp:City dbp:Cagliari dbp:Title "Cagliari" dbp:Cagliari dbp:Country dbp:Italy dbp:Cagliari dbp:postalCode 09100 dbp:Cagliari geo:lat "39.246387"xsd:float dbp:Cagliari geo:long "9.057500"xsd:float dbp:Cagliari rdf:type yago:MediterraneanPortCitiesAndTownsInItaly ... G An environment for collecting and structuring data. G Well defined structure of classification. september 6, 2012 14 / 37
  • 15. RDF G G Triples: (subject, predicate, object) Subject and object H are both URIs that each identify a resource, or a URI and a string literal respectively. H G Predicate H G specifies how the subject and object are related, and is also represented by a URI. For example: H H H A knows B C isAuthorOf D Two resources linked in this fashion can be drawn from different data sets on the Web, allowing data in one data source to be linked to that in another, thereby creating a Web of Data. september 6, 2012 15 / 37
  • 16. DBpedia G G G G Started in 2007. Is the result of a community effort to extract structured information from Wikipedia. Makes Wikipedia data available as RDF. Results: The DBpedia Data Set H H H G G describes 3.64 million "things" with over half a billion "facts" (July 2011), 364k persons, 462k places, 99k music albums, 54k films, 148k organisations; extraction in 97 different languages; 672M RDF triples It is maintained by: Universität Leipzig, Freie Universität Berlin, OpenLink Software, Inc. See http://wiki.dbpedia.org/Team september 6, 2012 16 / 37
  • 17. Nucleus of the Web of Data G G Within the W3C Linking Open Data (LOD) community effort. Tim Berners-Lee’s Linked Data principles. H H H H G G URI HTTP RDF, SPARQL Interlinking among data providers An increasing number of data providers have started to publish and interlink data on the Web. Several billion RDF triples and covers domains such as geographic information, people, companies, online communities, films, music, books and scientific publications. september 6, 2012 17 / 37
  • 20. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 20 / 37
  • 21. SPARQL Query Language G G G G RDF is a directed, labeled graph data format for representing information (also in the Web). SPARQL is a language for querying RDF graphs by specifying templates against which to compare graph components. Data which matches or satisfies a template is returned from the query. A triple template contains variables that represent triplet components (e.g., ?s, ?p, or ?o within a triplet). Example: H H H ?person ex:age "20"xsd:integer . Identifies a list of triplet subjects that have an ex:age property of "20". Analogous to asking "Who has age 20?". The SPARQL query engine will return a list of the subject component of triples that satisfy each query through value substitution. september 6, 2012 21 / 37
  • 22. SPARQL Queries SELECT variables_list FROM < RDF_source_URL > WHERE { { triple_pattern_1 . . . . triple_pattern_n . }. } SELECT ?person ?person FROM < http://ex.com > ------------------ WHERE { ?person ex:age "20"xsd:integer . _p1 _p2 . . . } september 6, 2012 22 / 37
  • 23. The DBpedia SPARQL endpoint G G All data sets are available for queries via the DBpedia SPARQL endpoint (http://dbpedia.org/sparql). Querying the data set: H H H H H ... Abstracts of movies starring Tom Cruise, released before 1999. The official websites of companies with more than 50000 employees. Cities with more than 2 million habitants. ... september 6, 2012 23 / 37
  • 24. Abstracts of movies starring Tom Cruise, released before 1999 SPARQL SELECT ?subject ?label ?released ?abstract WHERE { ?subject rdf:type <http://dbpedia.org/ontology/Film>. ?subject dbpedia2:starring <http://dbpedia.org/resource/Tom_Cruise>. ?subject rdfs:comment ?abstract. ?subject rdfs:label ?label. FILTER(lang(?abstract) = "en" && lang(?label) = "en"). ?subject <http://dbpedia.org/ontology/releaseDate> ?released. FILTER(xsd:date(?released) < "2000-01-01"^^xsd:date). } ORDER BY ?released september 6, 2012 24 / 37
  • 25. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 25 / 37
  • 26. Linked Data Search Engines and Indexes G A number of search engines have been developed that crawl Linked Data from the Web by following RDF links, and provide query capabilities over aggregated data. Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool. G G Google, Bing and Yahoo! agree to create and support a common vocabulary for structured data markup on web pages. Facebook has started to support RDF and Linked Data URIs and now provides access to parts of its user data via a Linked Data API. september 6, 2012 26 / 37
  • 28. Twitter, #annotations Twitter API based client september 6, 2012 28 / 37
  • 35. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 35 / 37
  • 36. Conclusion G G G G Data on the Web is a major challenge; technologies are needed to use them, to interact with them, to integrate them. Semantic Web technologies (RDF, SPARQL, etc.) can play a major role in publishing and using Data on the Web. Users can largely benefit from the wide world of structured content. Content providers joining the Linking Open Data project are contributing to create more meaningful navigation paths not only within websites but across the whole web. september 6, 2012 36 / 37