SlideShare uma empresa Scribd logo
1 de 43
BigData & Wikidata - no lies
SPARQL queries on DBPedia
Camelia Boban
BigData & Wikidata - no lies
Resources for the codelab:
Eclipse Luna for J2EE developers - https://www.eclipse.org/downloads/index-developer.php
Java SE 1.8 - http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html
Apache Tomcat 8.0.5 - http://tomcat.apache.org/download-80.cgi
Axis2 1.6.2 - http://axis.apache.org/axis2/java/core/download.cgi
Apache Jena 2.11.1 - http://jena.apache.org/download/
Dbpedia Sparql endpoint: - dbpedia.org/sparql
BigData & Wikidata - no lies
JAR needed:
httpclient-4.2.3.jar httpcore-4.2.2.jar Jena-arq-2.11.1.jar
Jena-core-2.11.1.jar Jena-iri-1.0.1.jar jena-sdb-1.4.1.jar jena-tdb-1.0.1.jar
slf4j-api-1.6.4.jar slf4j-log4j12-1.6.4.jar
xercesImpl-2.11.0.jar xml-apis-1.4.01.jar
Attention!!
NO jcl-over-slf4j-1.6.4.jar (slf4j-log4j12-1.6.4 conflict, “Can’t override final class exception”)
NO httpcore-4.0.jar (made by Axis, httpcore-4.2.2.jar conflict, don’t let create the WS)
BigData & Wikidata - no lies
The Semantic Web
The Semantic Web is a project that intends to add computer-processable meaning
(semantics) to the Word Wide Web.
SPARQL
A a protocol and a query language SQL-like for querying RDF graphs via pattern
matching
VIRTUOSO
Both back-end database engine and the HTTP/SPARQL server.
BigData & Wikidata - no lies
BigData & Wikidata - no lies
DBpedia.org
Is the Semantic Web mirror of Wikipedia.
RDF
Is a data model of graphs on subject, predicate, object triples.
APACHE JENA
A free and open source Java framework for building Semantic Web and Linked
Data applications.
ARQ - A SPARQL Processor for Jena for querying Remote SPARQL Services
BigData & Wikidata - no lies
BigData & Wikidata - no lies
DBpedia.org extracts from Wikipedia editions in 119 languages, convert it into RDF
and make this information available on the Web:
★ 24.9 million things (16.8 million from the English Dbpedia);
★ labels and abstracts for 12.6 million unique things;
★ 24.6 million links to images and 27.6 million links to external web pages;
★ 45.0 million external links into other RDF datasets, 67.0 million links to
Wikipedia categories, and 41.2 million YAGO categories.
BigData & Wikidata - no lies
The dataset consists of 2.46 billion RDF triples (470 million were extracted from
the English edition of Wikipedia), 1.98 billion from other language editions, and 45
million are links to external datasets.
DBpedia uses the Resource Description Framework (RDF) as a flexible data
model for representing extracted information and for publishing it on the Web. We
use the SPARQL query language to query this data.
BigData & Wikidata - no lies
BigData & Wikidata - no lies
What is a Triple?
A Triple is the minimal amount of information expressable in Semantic Web. It is
composed of 3 elements:
1. A subject which is a URI (e.g., a "web address") that represents something.
2. A predicate which is another URI that represents a certain property of the
subject.
3. An object which can be a URI or a literal (a string) that is related to the
subject through the predicate.
BigData & Wikidata - no lies
John has the email address john@email.com
(subject) (predicate) (object)
Subjects, predicates, and objects are represented with URIs, which can be
abbreviated as prefixed names.
Objects can also be literals: strings, integers, booleans, etc.
BigData & Wikidata - no lies
Why SPARQL?
SPARQL is a quey language of the Semantic Web that lets us:
1. Extract values from structured and semi-strutured data
2. Explore data by querying unknown relatioships
3. Perform complex join query of various dataset in a unique query
4. Trasform data from a vocabulary in another
BigData & Wikidata - no lies
Structure of a SPARQL query:
● Prefix declarations, for abbreviating URIs ( PREFIX dbpowl:
<http://dbpedia.org/ontology/Mountain> = dbpowl:Mountain)
● Dataset definition, stating what RDF graph(s) are being queried (DBPedia,
Darwin Core Terms, Yago, FOAF - Friend of a Friend)
● A result clause, identifying what information to return from the query The
query pattern, specifying what to query for in the underlying dataset (Select)
● Query modifiers, slicing, ordering, and otherwise rearranging query results -
ORDER BY, GROUP BY
BigData & Wikidata - no lies
BigData & Wikidata - no lies
##EXAMPLE - Give me all cities & towns in Abruzzo with more than 50,000
inhabitants
PREFIX dbpclass: <http://dbpedia.org/class/yago/>
PREFIX dbpprop: <http://dbpedia.org/property/>
SELECT ?resource ?value
WHERE {
?resource a dbpclass:CitiesAndTownsInAbruzzo .
?resource dbpprop:populationTotal ?value .
FILTER ( ?value > 50000 )
}
ORDER BY ?resource ?value
BigData & Wikidata - no lies
BigData & Wikidata - no lies
Some PREFIX:
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX txn: <http://lod.taxonconcept.org/ontology/txn.owl#>
BigData & Wikidata - no lies
DBPEDIA
----------------------------------------------------------------------------------
PREFIX dbp: <http://dbpedia.org/>
PREFIX dbpowl: <http://dbpedia.org/ontology/>
PREFIX dbpres: <http://dbpedia.org/resource/>
PREFIX dbpprop: <http://dbpedia.org/property/>
PREFIX dbpclass: <http://dbpedia.org/class/yago/>
BigData & Wikidata - no lies
Wikipedia articles consist mostly of free text, but also contain different types of
structured information: infobox templates, categorisation information, images,
geo-coordinates, and links to external Web pages. DBpedia transforms into RDF
triples data that are entered in Wikipedia. So creating a page in Wikipedia creates
RDF in DBpedia.
BigData & Wikidata - no lies
BigData & Wikidata - no lies
Example:
https://en.wikipedia.org/wiki/Pulp_Fiction describes the movie. DBpedia creates a
URI: http://dbpedia.org/resource/wikipedia_page_name (where
wikipedia_page_name is the name of the regular Wikipedia html page) =
http://dbpedia.org/page/Pulp_Fiction. Underscore characters replace spaces.
DBpedia can be queried via a Web interface at ttp://dbpedia.org/sparql . The
interface uses the Virtuoso SPARQL Query Editor to query the DBpedia endpoint.
BigData & Wikidata - no lies
Public SPARQL Endpoint - use OpenLink Virtuoso
Wikipedia page: http://en.wikipedia.org/wiki/Pulp_Fiction
DBPedia resource: http://dbpedia.org/page/Pulp_Fiction
InfoBox: dbpedia-owl:abstract; dbpedia-owl:starring; dbpedia-owl:budget;
dbpprop:country; dbpprop:caption ecc.
For instance, the figure below shows the source code and the visualisation of an
infobox template containing structured information about Pulp Fiction.
BigData & Wikidata - no lies
Big&Wikidata - no lies
Big&Wikidata - no lies
PREFIX prop: <http://dbpedia.org/property/>
PREFIX res:<http://dbpedia.org/resource/>
PREFIX owl:<http://dbpedia.org/ontology/>
SELECT DISTINCT ?name ?abstract ?caption ?image ?budget ?director ?cast ?country ?category
WHERE {
res:Pulp_Fiction prop:name ?name ;
owl:abstract ?abstract ;
prop:caption ?caption;
owl:thumbnail ?image;
owl:budget ?budget ;
owl:director ?director ;
owl:starring ?cast ;
prop:country ?country ;
dcterms:subject ?category .
FILTER langMatches( lang(?abstract), 'en').
}
Big&Wikidata - no lies
...
Linked Data is a method of publishing RDF data on the Web and of interlinking
data between different data sources.
Query builder:
➢ http://dbpedia.org/snorql/
➢ http://querybuilder.dbpedia.org/
➢ http://dbpedia.org/isparql/
➢ http://dbpedia.org/fct/
➢ http://it.dbpedia.org/sparql
Prefix variables start with "?"
BigData & Wikidata - no lies
The current RDF vocabularies are available at the following locations:
➔ W3: http://www.w3.org/TR/vcard-rdf/ vCard Ontology - for describing People
and Organizations
http://www.w3.org/2003/01/geo/ Geo Ontology - for spatially-located things
http://www.w3.org/2004/02/geo/ SKOS Simple Knowledge Organization
System
BigData & Wikidata - no lies
➔ GEO NAMES: http://www.geonames.org/ geospatial semantic information
(postal code)
➔ DUBLIN CORE: http://www.dublincore.org/ defines general metadata
attributes used in a particular application
➔ FOAF: http://www.foaf-project.org/ Friend of a Friend, vocabulary for
describing people
➔ UNIPROT: http://www.uniprot.org/core/, http://beta.sparql.uniprot.org/uniprot
for science articles
BigData & Wikidata - no lies
➔ MUSIC ONTOLOGY: http://musicontology.com/, provides terms for
describing artists, albums and tracks.
➔ REVIEW VOCABULARY: http://purl.org/stuff/rev , vocabulary for
representing reviews.
➔ CREATIVE COMMONS (CC): http://creativecommons.org/ns , vocabulary for
describing license terms.
➔ OPEN UNIVERSITY: http://data.open.ac.uk/
BigData & Wikidata - no lies
➔ Semantically-Interlinked Online Communities (SIOC): www.sioc-
project.org/, vocabulary for representing online communities
➔ Description of a Project (DOAP): http://usefulinc.com/doap/, vocabulary for
describing projects
➔ Simple Knowledge Organization System (SKOS):
http://www.w3.org/2004/02/skos/, vocabulary for representing taxonomies and
loosely structured knowledge
BigData & Wikidata - no lies
BigData & Wikidata - no lies
SPARQL queries have two parts (FROM is not indispensable):
1. The query (WHERE) part, which produces a list of variable bindings (although
some variables may be unbound).
2. The part which puts together the results. SELECT, ASK, CONSTRUCT, or
DESCRIBE.
Other keywords:
UNION, OPTIONAL (optional display if data exists), FILTER (conditions), ORDER
BY, GROUP BY
BigData & Wikidata - no lies
SELECT - is effectively what the query returns (a ResultSet)
ASK - just looks to see if there are any results
COSTRUCT - uses a template to make RDF from the results. For each result row
it binds the variables and adds the statements to the result model. If a template
triple contains an unbound variable it is skipped. Return a new RDF-Graph
DESCRIBE - unusual, since it takes each result node, finds triples associated with
it, and adds them to a result model. Return a new RDF-Graph
BigData & Wikidata - no lies
What linked data il good for? Don’t search a single thing, but explore a whole
set of related things together!
1) Revolutionize Wikipedia Search
2) Include DBpedia data in our own web page
3) Mobile and Geographic Applications
4) Document Classification, Annotation and Social Bookmarking
5) Multi-Domain Ontology
6) Nucleus for the Web of Data
BigData & Wikidata - no lies
BigData & Wikidata - no lies
MOBILE
QRpedia.org - MIT Licence
BigData & Wikidata - no lies
WIKIPEDIA DUMPS
● Arabic Wikipedia dumps: http://dumps.wikimedia.org/arwiki/
● Dutch Wikipedia dumps: http://dumps.wikimedia.org/nlwiki/
● English Wikipedia dumps: http://dumps.wikimedia.org/enwiki/
● French Wikipedia dumps: http://dumps.wikimedia.org/frwiki/
● German Wikipedia dumps: http://dumps.wikimedia.org/dewiki/
● Italian Wikipedia dumps: http://dumps.wikimedia.org/itwiki/
● Persian Wikipedia dumps: http://dumps.wikimedia.org/fawiki/
● Polish Wikipedia dumps: http://dumps.wikimedia.org/plwiki/
BigData & Wikidata - no lies
WIKIPEDIA DUMPS
● Portuguese Wikipedia dumps: http://dumps.wikimedia.org/ptwiki/
● Russian Wikipedia dumps: http://dumps.wikimedia.org/ruwiki/
● Serbian Wikipedia dumps: http://dumps.wikimedia.org/srwiki/
● Spanish Wikipedia dumps: http://dumps.wikimedia.org/eswiki/
● Swedish Wikipedia dumps: http://dumps.wikimedia.org/svwiki/
● Ukrainian Wikipedia dumps: http://dumps.wikimedia.org/ukwiki/
● Vietnamese Wikipedia dumps: http://dumps.wikimedia.org/viwiki/
BigData & Wikidata - no lies
LINK
Codelab’s project code: http://github.com/GDG-L-Ab/SparqlOpendataWS
http://dbpedia.org/sparql & http://it.dbpedia.org/sparql
http://wiki.dbpedia.org/Datasets
http://en.wikipedia.org/ & http://it.wikipedia.org/
http://dbpedia.org/snorql, http://data.semanticweb.org/snorql/ SPARQL Explorer
http://downloads.dbpedia.org/3.9/ & http://wiki.dbpedia.org/Downloads39
BigData & Wikidata - no lies
Projects that use linked data:
JAVA: Open Learn Linked data: free access to Open University course materials
PHP: Semantic MediaWiki -Lllets you store and query data within the wiki's pages.
PEARL: WikSAR
PYTHON: Braindump - semantic search in Wikipedia
RUBY: SemperWiki
BigData & Wikidata - no lies
BigData & Wikidata - no lies
THANK YOU! :-)
I AM
CAMELIA BOBAN
G+ : https://plus.google.com/u/0/+cameliaboban
Twitter : http://twitter.com/GDGRomaLAb
LinkedIn: it.linkedin.com/pub/camelia-boban/22/191/313/
Blog: http://blog.aissatechnologies.com/
Skype: camelia.boban
camelia.boban@gmail.com

Mais conteúdo relacionado

Mais procurados

Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
Muhammad Saleem
 
Linked data for librarians
Linked data for librariansLinked data for librarians
Linked data for librarians
trevorthornton
 

Mais procurados (20)

Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
FOAF
FOAFFOAF
FOAF
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
Saveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF dataSaveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF data
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
 
AAT LOD Microthesauri
AAT LOD MicrothesauriAAT LOD Microthesauri
AAT LOD Microthesauri
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
 
SWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDFSWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDF
 
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prv
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...
Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...
Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes sense
 
RDFa Introductory Course Session 2/4 How RDFa
RDFa Introductory Course Session 2/4 How RDFaRDFa Introductory Course Session 2/4 How RDFa
RDFa Introductory Course Session 2/4 How RDFa
 
Linked data for librarians
Linked data for librariansLinked data for librarians
Linked data for librarians
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
 

Destaque

Factures d’aigua
Factures d’aiguaFactures d’aigua
Factures d’aigua
adna1697
 
Mosquera assg 3_1
Mosquera assg 3_1Mosquera assg 3_1
Mosquera assg 3_1
Gordi Hatch
 

Destaque (6)

Factures d’aigua
Factures d’aiguaFactures d’aigua
Factures d’aigua
 
Mosquera assg 3_1
Mosquera assg 3_1Mosquera assg 3_1
Mosquera assg 3_1
 
An agency-by-agency guide to Obama's 2014 budget
An agency-by-agency guide to Obama's 2014 budgetAn agency-by-agency guide to Obama's 2014 budget
An agency-by-agency guide to Obama's 2014 budget
 
WDG - Wikidonne in wikipedia
WDG - Wikidonne in wikipediaWDG - Wikidonne in wikipedia
WDG - Wikidonne in wikipedia
 
Approccio wiki nella scuola
Approccio wiki nella scuolaApproccio wiki nella scuola
Approccio wiki nella scuola
 
WDG - Disconnect - essere social ma più sicure in rete
WDG - Disconnect - essere social ma più sicure in reteWDG - Disconnect - essere social ma più sicure in rete
WDG - Disconnect - essere social ma più sicure in rete
 

Semelhante a GDG Meets U event - Big data & Wikidata - no lies codelab

Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
Juan Sequeda
 
Linked data: spreading data over the web
Linked data: spreading data over the webLinked data: spreading data over the web
Linked data: spreading data over the web
shellac
 

Semelhante a GDG Meets U event - Big data & Wikidata - no lies codelab (20)

How RDFa works
How RDFa worksHow RDFa works
How RDFa works
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Semantic web and Drupal: an introduction
Semantic web and Drupal: an introductionSemantic web and Drupal: an introduction
Semantic web and Drupal: an introduction
 
Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
Linked data: spreading data over the web
Linked data: spreading data over the webLinked data: spreading data over the web
Linked data: spreading data over the web
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009
 
AGROVOC, AGRIS and the CIARD RING, using RDF vocabularies and technologies f...
AGROVOC, AGRIS and the CIARD RING,  using RDF vocabularies and technologies f...AGROVOC, AGRIS and the CIARD RING,  using RDF vocabularies and technologies f...
AGROVOC, AGRIS and the CIARD RING, using RDF vocabularies and technologies f...
 
Presentation at the EMBL-EBI Industry RDF meeting
Presentation at the EMBL-EBI  Industry RDF meetingPresentation at the EMBL-EBI  Industry RDF meeting
Presentation at the EMBL-EBI Industry RDF meeting
 
Querying Bio2RDF data
Querying Bio2RDF dataQuerying Bio2RDF data
Querying Bio2RDF data
 
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data SourcesVirtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
 
Data in RDF
Data in RDFData in RDF
Data in RDF
 
SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in Practice
 
Linked Data on Rails
Linked Data on RailsLinked Data on Rails
Linked Data on Rails
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Semantic Web talk TEMPLATE
Semantic Web talk TEMPLATESemantic Web talk TEMPLATE
Semantic Web talk TEMPLATE
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 

Mais de CAMELIA BOBAN

Mais de CAMELIA BOBAN (20)

WDG - Le donne in Wikipedia. Festival "Rieti Digital"
WDG - Le donne in Wikipedia. Festival  "Rieti Digital"WDG - Le donne in Wikipedia. Festival  "Rieti Digital"
WDG - Le donne in Wikipedia. Festival "Rieti Digital"
 
World Wild Wikidata
World Wild WikidataWorld Wild Wikidata
World Wild Wikidata
 
WDG - WikiDonne User Group
WDG -  WikiDonne User GroupWDG -  WikiDonne User Group
WDG - WikiDonne User Group
 
WDG - Il genere in Wikipedia
WDG -  Il genere in WikipediaWDG -  Il genere in Wikipedia
WDG - Il genere in Wikipedia
 
Wiki donne &amp; art + feminism 2018 alla BNN
Wiki donne &amp; art + feminism 2018 alla BNNWiki donne &amp; art + feminism 2018 alla BNN
Wiki donne &amp; art + feminism 2018 alla BNN
 
WDG - Scaling diversity campaigns and programs
WDG - Scaling diversity campaigns and programsWDG - Scaling diversity campaigns and programs
WDG - Scaling diversity campaigns and programs
 
WDG - Address the diversity in itwiki
WDG - Address the diversity in itwikiWDG - Address the diversity in itwiki
WDG - Address the diversity in itwiki
 
Wikipedia e BPV
Wikipedia e BPVWikipedia e BPV
Wikipedia e BPV
 
WDG - WikiDonne 4 WikiFemHack @ WikiFemHack 2017
WDG - WikiDonne 4 WikiFemHack  @ WikiFemHack 2017WDG - WikiDonne 4 WikiFemHack  @ WikiFemHack 2017
WDG - WikiDonne 4 WikiFemHack @ WikiFemHack 2017
 
WDG - One year of WikiDonne @ CEE Meeting 2017
WDG - One year of WikiDonne  @ CEE Meeting 2017WDG - One year of WikiDonne  @ CEE Meeting 2017
WDG - One year of WikiDonne @ CEE Meeting 2017
 
WDG - WikiDonne's roadmap @ WikiWomenCamp 2017
WDG - WikiDonne's roadmap @ WikiWomenCamp 2017WDG - WikiDonne's roadmap @ WikiWomenCamp 2017
WDG - WikiDonne's roadmap @ WikiWomenCamp 2017
 
WDG - Le donne e le nuove tecnologie: formazione e lavoro
WDG - Le donne e le nuove tecnologie: formazione e lavoroWDG - Le donne e le nuove tecnologie: formazione e lavoro
WDG - Le donne e le nuove tecnologie: formazione e lavoro
 
WDG - Donne e social media: pericoli e opportunità della rete
WDG - Donne e social media: pericoli e opportunità della reteWDG - Donne e social media: pericoli e opportunità della rete
WDG - Donne e social media: pericoli e opportunità della rete
 
WDG - Percorsi di parità in Wikipedia
WDG - Percorsi di parità in WikipediaWDG - Percorsi di parità in Wikipedia
WDG - Percorsi di parità in Wikipedia
 
Let's wiki Linux Day 2016
Let's wiki Linux Day 2016 Let's wiki Linux Day 2016
Let's wiki Linux Day 2016
 
WDG - Licenze libere
WDG - Licenze libereWDG - Licenze libere
WDG - Licenze libere
 
WDG - 100 donne contro gli stereotipi
WDG - 100 donne contro gli stereotipiWDG - 100 donne contro gli stereotipi
WDG - 100 donne contro gli stereotipi
 
WDG - CoderDojo Roma - SID2016 (short version)
WDG - CoderDojo Roma - SID2016  (short version)WDG - CoderDojo Roma - SID2016  (short version)
WDG - CoderDojo Roma - SID2016 (short version)
 
WMI: Open culture is open mind
WMI: Open culture is open mindWMI: Open culture is open mind
WMI: Open culture is open mind
 
Angel Eats project presented @ Koding Hackathon
Angel Eats project presented @ Koding HackathonAngel Eats project presented @ Koding Hackathon
Angel Eats project presented @ Koding Hackathon
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

GDG Meets U event - Big data & Wikidata - no lies codelab

  • 1. BigData & Wikidata - no lies SPARQL queries on DBPedia Camelia Boban
  • 2. BigData & Wikidata - no lies
  • 3. Resources for the codelab: Eclipse Luna for J2EE developers - https://www.eclipse.org/downloads/index-developer.php Java SE 1.8 - http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html Apache Tomcat 8.0.5 - http://tomcat.apache.org/download-80.cgi Axis2 1.6.2 - http://axis.apache.org/axis2/java/core/download.cgi Apache Jena 2.11.1 - http://jena.apache.org/download/ Dbpedia Sparql endpoint: - dbpedia.org/sparql BigData & Wikidata - no lies
  • 4. JAR needed: httpclient-4.2.3.jar httpcore-4.2.2.jar Jena-arq-2.11.1.jar Jena-core-2.11.1.jar Jena-iri-1.0.1.jar jena-sdb-1.4.1.jar jena-tdb-1.0.1.jar slf4j-api-1.6.4.jar slf4j-log4j12-1.6.4.jar xercesImpl-2.11.0.jar xml-apis-1.4.01.jar Attention!! NO jcl-over-slf4j-1.6.4.jar (slf4j-log4j12-1.6.4 conflict, “Can’t override final class exception”) NO httpcore-4.0.jar (made by Axis, httpcore-4.2.2.jar conflict, don’t let create the WS) BigData & Wikidata - no lies
  • 5. The Semantic Web The Semantic Web is a project that intends to add computer-processable meaning (semantics) to the Word Wide Web. SPARQL A a protocol and a query language SQL-like for querying RDF graphs via pattern matching VIRTUOSO Both back-end database engine and the HTTP/SPARQL server. BigData & Wikidata - no lies
  • 6. BigData & Wikidata - no lies
  • 7. DBpedia.org Is the Semantic Web mirror of Wikipedia. RDF Is a data model of graphs on subject, predicate, object triples. APACHE JENA A free and open source Java framework for building Semantic Web and Linked Data applications. ARQ - A SPARQL Processor for Jena for querying Remote SPARQL Services BigData & Wikidata - no lies
  • 8. BigData & Wikidata - no lies
  • 9. DBpedia.org extracts from Wikipedia editions in 119 languages, convert it into RDF and make this information available on the Web: ★ 24.9 million things (16.8 million from the English Dbpedia); ★ labels and abstracts for 12.6 million unique things; ★ 24.6 million links to images and 27.6 million links to external web pages; ★ 45.0 million external links into other RDF datasets, 67.0 million links to Wikipedia categories, and 41.2 million YAGO categories. BigData & Wikidata - no lies
  • 10. The dataset consists of 2.46 billion RDF triples (470 million were extracted from the English edition of Wikipedia), 1.98 billion from other language editions, and 45 million are links to external datasets. DBpedia uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web. We use the SPARQL query language to query this data. BigData & Wikidata - no lies
  • 11. BigData & Wikidata - no lies
  • 12. What is a Triple? A Triple is the minimal amount of information expressable in Semantic Web. It is composed of 3 elements: 1. A subject which is a URI (e.g., a "web address") that represents something. 2. A predicate which is another URI that represents a certain property of the subject. 3. An object which can be a URI or a literal (a string) that is related to the subject through the predicate. BigData & Wikidata - no lies
  • 13. John has the email address john@email.com (subject) (predicate) (object) Subjects, predicates, and objects are represented with URIs, which can be abbreviated as prefixed names. Objects can also be literals: strings, integers, booleans, etc. BigData & Wikidata - no lies
  • 14. Why SPARQL? SPARQL is a quey language of the Semantic Web that lets us: 1. Extract values from structured and semi-strutured data 2. Explore data by querying unknown relatioships 3. Perform complex join query of various dataset in a unique query 4. Trasform data from a vocabulary in another BigData & Wikidata - no lies
  • 15. Structure of a SPARQL query: ● Prefix declarations, for abbreviating URIs ( PREFIX dbpowl: <http://dbpedia.org/ontology/Mountain> = dbpowl:Mountain) ● Dataset definition, stating what RDF graph(s) are being queried (DBPedia, Darwin Core Terms, Yago, FOAF - Friend of a Friend) ● A result clause, identifying what information to return from the query The query pattern, specifying what to query for in the underlying dataset (Select) ● Query modifiers, slicing, ordering, and otherwise rearranging query results - ORDER BY, GROUP BY BigData & Wikidata - no lies
  • 16. BigData & Wikidata - no lies
  • 17. ##EXAMPLE - Give me all cities & towns in Abruzzo with more than 50,000 inhabitants PREFIX dbpclass: <http://dbpedia.org/class/yago/> PREFIX dbpprop: <http://dbpedia.org/property/> SELECT ?resource ?value WHERE { ?resource a dbpclass:CitiesAndTownsInAbruzzo . ?resource dbpprop:populationTotal ?value . FILTER ( ?value > 50000 ) } ORDER BY ?resource ?value BigData & Wikidata - no lies
  • 18. BigData & Wikidata - no lies
  • 19. Some PREFIX: PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX txn: <http://lod.taxonconcept.org/ontology/txn.owl#> BigData & Wikidata - no lies
  • 20. DBPEDIA ---------------------------------------------------------------------------------- PREFIX dbp: <http://dbpedia.org/> PREFIX dbpowl: <http://dbpedia.org/ontology/> PREFIX dbpres: <http://dbpedia.org/resource/> PREFIX dbpprop: <http://dbpedia.org/property/> PREFIX dbpclass: <http://dbpedia.org/class/yago/> BigData & Wikidata - no lies
  • 21. Wikipedia articles consist mostly of free text, but also contain different types of structured information: infobox templates, categorisation information, images, geo-coordinates, and links to external Web pages. DBpedia transforms into RDF triples data that are entered in Wikipedia. So creating a page in Wikipedia creates RDF in DBpedia. BigData & Wikidata - no lies
  • 22. BigData & Wikidata - no lies
  • 23. Example: https://en.wikipedia.org/wiki/Pulp_Fiction describes the movie. DBpedia creates a URI: http://dbpedia.org/resource/wikipedia_page_name (where wikipedia_page_name is the name of the regular Wikipedia html page) = http://dbpedia.org/page/Pulp_Fiction. Underscore characters replace spaces. DBpedia can be queried via a Web interface at ttp://dbpedia.org/sparql . The interface uses the Virtuoso SPARQL Query Editor to query the DBpedia endpoint. BigData & Wikidata - no lies
  • 24. Public SPARQL Endpoint - use OpenLink Virtuoso Wikipedia page: http://en.wikipedia.org/wiki/Pulp_Fiction DBPedia resource: http://dbpedia.org/page/Pulp_Fiction InfoBox: dbpedia-owl:abstract; dbpedia-owl:starring; dbpedia-owl:budget; dbpprop:country; dbpprop:caption ecc. For instance, the figure below shows the source code and the visualisation of an infobox template containing structured information about Pulp Fiction. BigData & Wikidata - no lies
  • 26. Big&Wikidata - no lies PREFIX prop: <http://dbpedia.org/property/> PREFIX res:<http://dbpedia.org/resource/> PREFIX owl:<http://dbpedia.org/ontology/> SELECT DISTINCT ?name ?abstract ?caption ?image ?budget ?director ?cast ?country ?category WHERE { res:Pulp_Fiction prop:name ?name ; owl:abstract ?abstract ; prop:caption ?caption; owl:thumbnail ?image; owl:budget ?budget ; owl:director ?director ; owl:starring ?cast ; prop:country ?country ; dcterms:subject ?category . FILTER langMatches( lang(?abstract), 'en'). }
  • 27. Big&Wikidata - no lies ...
  • 28. Linked Data is a method of publishing RDF data on the Web and of interlinking data between different data sources. Query builder: ➢ http://dbpedia.org/snorql/ ➢ http://querybuilder.dbpedia.org/ ➢ http://dbpedia.org/isparql/ ➢ http://dbpedia.org/fct/ ➢ http://it.dbpedia.org/sparql Prefix variables start with "?" BigData & Wikidata - no lies
  • 29. The current RDF vocabularies are available at the following locations: ➔ W3: http://www.w3.org/TR/vcard-rdf/ vCard Ontology - for describing People and Organizations http://www.w3.org/2003/01/geo/ Geo Ontology - for spatially-located things http://www.w3.org/2004/02/geo/ SKOS Simple Knowledge Organization System BigData & Wikidata - no lies
  • 30. ➔ GEO NAMES: http://www.geonames.org/ geospatial semantic information (postal code) ➔ DUBLIN CORE: http://www.dublincore.org/ defines general metadata attributes used in a particular application ➔ FOAF: http://www.foaf-project.org/ Friend of a Friend, vocabulary for describing people ➔ UNIPROT: http://www.uniprot.org/core/, http://beta.sparql.uniprot.org/uniprot for science articles BigData & Wikidata - no lies
  • 31. ➔ MUSIC ONTOLOGY: http://musicontology.com/, provides terms for describing artists, albums and tracks. ➔ REVIEW VOCABULARY: http://purl.org/stuff/rev , vocabulary for representing reviews. ➔ CREATIVE COMMONS (CC): http://creativecommons.org/ns , vocabulary for describing license terms. ➔ OPEN UNIVERSITY: http://data.open.ac.uk/ BigData & Wikidata - no lies
  • 32. ➔ Semantically-Interlinked Online Communities (SIOC): www.sioc- project.org/, vocabulary for representing online communities ➔ Description of a Project (DOAP): http://usefulinc.com/doap/, vocabulary for describing projects ➔ Simple Knowledge Organization System (SKOS): http://www.w3.org/2004/02/skos/, vocabulary for representing taxonomies and loosely structured knowledge BigData & Wikidata - no lies
  • 33. BigData & Wikidata - no lies
  • 34. SPARQL queries have two parts (FROM is not indispensable): 1. The query (WHERE) part, which produces a list of variable bindings (although some variables may be unbound). 2. The part which puts together the results. SELECT, ASK, CONSTRUCT, or DESCRIBE. Other keywords: UNION, OPTIONAL (optional display if data exists), FILTER (conditions), ORDER BY, GROUP BY BigData & Wikidata - no lies
  • 35. SELECT - is effectively what the query returns (a ResultSet) ASK - just looks to see if there are any results COSTRUCT - uses a template to make RDF from the results. For each result row it binds the variables and adds the statements to the result model. If a template triple contains an unbound variable it is skipped. Return a new RDF-Graph DESCRIBE - unusual, since it takes each result node, finds triples associated with it, and adds them to a result model. Return a new RDF-Graph BigData & Wikidata - no lies
  • 36. What linked data il good for? Don’t search a single thing, but explore a whole set of related things together! 1) Revolutionize Wikipedia Search 2) Include DBpedia data in our own web page 3) Mobile and Geographic Applications 4) Document Classification, Annotation and Social Bookmarking 5) Multi-Domain Ontology 6) Nucleus for the Web of Data BigData & Wikidata - no lies
  • 37. BigData & Wikidata - no lies
  • 38. MOBILE QRpedia.org - MIT Licence BigData & Wikidata - no lies
  • 39. WIKIPEDIA DUMPS ● Arabic Wikipedia dumps: http://dumps.wikimedia.org/arwiki/ ● Dutch Wikipedia dumps: http://dumps.wikimedia.org/nlwiki/ ● English Wikipedia dumps: http://dumps.wikimedia.org/enwiki/ ● French Wikipedia dumps: http://dumps.wikimedia.org/frwiki/ ● German Wikipedia dumps: http://dumps.wikimedia.org/dewiki/ ● Italian Wikipedia dumps: http://dumps.wikimedia.org/itwiki/ ● Persian Wikipedia dumps: http://dumps.wikimedia.org/fawiki/ ● Polish Wikipedia dumps: http://dumps.wikimedia.org/plwiki/ BigData & Wikidata - no lies
  • 40. WIKIPEDIA DUMPS ● Portuguese Wikipedia dumps: http://dumps.wikimedia.org/ptwiki/ ● Russian Wikipedia dumps: http://dumps.wikimedia.org/ruwiki/ ● Serbian Wikipedia dumps: http://dumps.wikimedia.org/srwiki/ ● Spanish Wikipedia dumps: http://dumps.wikimedia.org/eswiki/ ● Swedish Wikipedia dumps: http://dumps.wikimedia.org/svwiki/ ● Ukrainian Wikipedia dumps: http://dumps.wikimedia.org/ukwiki/ ● Vietnamese Wikipedia dumps: http://dumps.wikimedia.org/viwiki/ BigData & Wikidata - no lies
  • 41. LINK Codelab’s project code: http://github.com/GDG-L-Ab/SparqlOpendataWS http://dbpedia.org/sparql & http://it.dbpedia.org/sparql http://wiki.dbpedia.org/Datasets http://en.wikipedia.org/ & http://it.wikipedia.org/ http://dbpedia.org/snorql, http://data.semanticweb.org/snorql/ SPARQL Explorer http://downloads.dbpedia.org/3.9/ & http://wiki.dbpedia.org/Downloads39 BigData & Wikidata - no lies
  • 42. Projects that use linked data: JAVA: Open Learn Linked data: free access to Open University course materials PHP: Semantic MediaWiki -Lllets you store and query data within the wiki's pages. PEARL: WikSAR PYTHON: Braindump - semantic search in Wikipedia RUBY: SemperWiki BigData & Wikidata - no lies
  • 43. BigData & Wikidata - no lies THANK YOU! :-) I AM CAMELIA BOBAN G+ : https://plus.google.com/u/0/+cameliaboban Twitter : http://twitter.com/GDGRomaLAb LinkedIn: it.linkedin.com/pub/camelia-boban/22/191/313/ Blog: http://blog.aissatechnologies.com/ Skype: camelia.boban camelia.boban@gmail.com