Consuming Linked Data SemTech2010

Consuming Linked Data Juan F. Sequeda Department of Computer Science University of Texas at Austin SemTech 2010

How many people are familiar with RDF SPARQL Linked Data Web Architecture (HTTP, etc)

History Linked Data Design Issues by TimBL July 2006 Linked Open Data Project WWW2007 First LOD Cloud May 2007 1st Linked Data on the Web Workshop WWW2008 1stTriplification Challenge 2008 How to Publish Linked Data Tutorial ISWC2008 BBC publishes Linked Data 2008 2nd Linked Data on the Web Workshop WWW2009 NY Times announcement SemTech2009 - ISWC09 1st Linked Data-a-thon ISWC2009 1st How to Consume Linked Data Tutorial ISWC2009 Data.gov.uk publishes Linked Data 2010 2st How to Consume Linked Data Tutorial WWW2010 1st International Workshop on Consuming Linked Data COLD2010 …

June 2010 YOU GET THE PICTURE ITS BIG and getting BIGGER and BIGGER

Now what can we do with this data?

The Modigliani Test Show me all the locations of all the original paintings of Modigliani Daniel Koller (@dakoller) showed that you can find this with a SPARQL query on DBpedia Thanks Richard MacManus - ReadWriteWeb

Results of the Modigliani Test AtanasKiryakov from Ontotext Used LDSR – Linked Data Semantic Repository Dbpedia Freebase Geonames UMBEL Wordnet Published April 26, 2010: http://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php

SPARQL Query PREFIX fb: http://rdf.freebase.com/ns/ PREFIX dbpedia: http://dbpedia.org/resource/ PREFIX dbp-prop: http://dbpedia.org/property/ PREFIX dbp-ont: http://dbpedia.org/ontology/ PREFIX umbel-sc: http://umbel.org/umbel/sc/ PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX ot: http://www.ontotext.com/ SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_cit WHERE { ?pfb:visual_art.artwork.artistdbpedia:Amedeo_Modigliani ; fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] ; ot:preferredLabel ?painting_l. ?owot:preferredLabel ?owner_l . OPTIONAL { ?owfb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } . OPTIONAL { ?owdbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc } OPTIONAL { ?owdbp-ont:city [ ot:preferredLabel ?city_db_cit ] }}

Let’s start by making sure that we understand what Linked Data is…

Search for Football Players who went to the University of Texas at Austin, played for the Dallas Cowboys as Cornerback

Why can’t we just FIND it…

I’ll tell you how I did NOT find it

Current Web = internet + links + docs

So what is the problem? We aren’t always interested in documents We are interested in THINGS These THINGS might be in documents We can read a HTML document rendered in a browser and find what we are searching for This is hard for computers. Computers have to guess (even though they are pretty good at it)

What do we need to do? Make it easy for computers/software to find THINGS

How can we do that? Besides publishing documents on the web which computers can’t understand easily Let’s publish something that computers can understand

But wait… don’t we do that already?

Current Data on the Web Relational Databases APIs XML CSV XLS … Can’t computers and applications already consume that data on the web?

True! But it is all in different formats and data models!

This makes it hard to integrate data

The data in different data sources aren’t linked

For example, how do I know that the Juan Sequeda in Facebook is the same as Juan Sequeda in Twitter

Or if I create a mashup from different services, I have to learn different APIs and I get different formats of data back

Wouldn’t it be great if we had a standard way of publishing data on the Web?

We have a standardized way of publishing documents on the web, right? HTML

Then why can’t we have a standard way of publishing data on the Web?

Good question! And the answer is YES. There is!

Resource Description Framework (RDF) A data model A way to model data i.e. Relational databases use relational data model RDF is a triple data model Labeled Graph Subject, Predicate, Object <Juan> <was born in> <California> <California> <is part of> <the USA> <Juan> <likes> <the Semantic Web>

RDF can be serialized in different ways RDF/XML RDFa (RDF in HTML) N3 Turtle JSON

So does that mean that I have to publish my data in RDF now?

You don’t have to… but we would like you to 

Databases back up documents THINGS have PROPERTIES: A Book as a Title, an author, … This is a THING: A book title “Programming the Semantic Web” by Toby Segaran, …

Lets represent the data in RDF Programming the Semantic Web title author book Toby Segaran isbn 978-0-596-15381-6 publisher name Publisher O’Reilly

Remember that we are on the web Everything on the web is identified by a URI

And now let’s link the data to other data Programming the Semantic Web title author http://…/isbn978 Toby Segaran isbn 978-0-596-15381-6 publisher name http://…/publisher1 O’Reilly

And now consider the data from Revyu.com hasReview http://…/review1 http://…/isbn978 description reviewer Awesome Book http://…/reviewer name Juan Sequeda

Let’s start to link data hasReview http://…/review1 http://…/isbn978 Programming the Semantic Web title description sameAs hasReviewer Awesome Book author http://…/isbn978 Toby Segaran http://…/reviewer name isbn 978-0-596-15381-6 Juan Sequeda publisher name http://…/publisher1 O’Reilly

Juan Sequeda publishes data too http://juansequeda.com/id http://dbpedia.org/Austin livesIn name Juan Sequeda

Let’s link more data hasReview http://…/review1 http://…/isbn978 description hasReviewer Awesome Book http://…/reviewer name Juan Sequeda sameAs http://juansequeda.com/id http://dbpedia.org/Austin livesIn name Juan Sequeda

And more hasReview http://…/review1 http://…/isbn978 Programming the Semantic Web title description sameAs hasReviewer Awesome Book author http://…/isbn978 Toby Segaran http://…/reviewer name isbn 978-0-596-15381-6 Juan Sequeda publisher sameAs http://…/publisher1 name O’Reilly http://juansequeda.com/id http://dbpedia.org/Austin livesIn name Juan Sequeda

Data on the Web that is in RDF and is linked to other RDF data is LINKED DATA

Linked Data Principles Use URIs as names for things Use HTTP URIs so that people can look up (dereference) those names. When someone looks up a URI, provide useful information. Include links to other URIs so that they can discover more things.

Linked Data makes the web appear as ONEGIANTHUGEGLOBALDATABASE!

I can query a database with SQL. Is there a way to query Linked Data with a query language?

Yes! There is actually a standardize language for that SPARQL

FIND all the reviews on the book “Programming the Semantic Web” by people who live in Austin

hasReview http://…/review1 http://…/isbn978 Programming the Semantic Web title description sameAs hasReviewer Awesome Book author http://…/isbn978 Toby Segaran http://…/reviewer name isbn 978-0-596-15381-6 Juan Sequeda publisher sameAs name http://…/publisher1 O’Reilly http://juansequeda.com http://dbpedia.org/Austin livesIn name Juan Sequeda

This looks cool, but let’s be realistic. What is the incentive to publish Linked Data?

What was your incentive to publish an HTML page in 1990?

1) Share data in documents2) Because you neighbor was doing it

So why should we publish Linked Data in 2010?

1) Share data as data2) Because you neighbor is doing it

And guess who is starting to publish Linked Data now?

Linked Data Publishers UK Government US Government BBC Open Calais – Thomson Reuters Freebase NY Times Best Buy CNET Dbpedia Are you?

How can I publish Linked Data?

Publishing Linked Data Legacy Data in Relational Databases D2R Server Virtuoso Triplify Ultrawrap CMS Drupal 7 Native RDF Stores Databases for RDF (Triple Stores) AllegroGraph, Jena, Sesame, Virtuoso Talis Platform (Linked Data in the Cloud) In HTML with RDFa

Consuming Linked Data by Humans

<span rel="foaf:interest"> <a href="http://dbpedia.org/resource/Database" property="dcterms:title">Database</a>, <a href="http://dbpedia.org/resource/Data_integration" property="dcterms:title">Data Integration</a>, <a href="http://dbpedia.org/resource/Semantic_Web" property="dcterms:title">Semantic Web</a>, <a href="http://dbpedia.org/resource/Linked_Data" property="dcterms:title">Linked Data</a>, etc.</span>

HTML Browsers RDF can be serialized in RDFa Have you heard of Yahoo’s Search Monkey Google Rich Snippets? They are consuming RDFa But WHY?

Because there is life beyond ten blue links

Google and Yahoo are starting to crawl RDFa! The Semantic Web is a reality!

The Reality Yahoo is crawling data that is in RDFa and Microformats under a specific vocabularies FOAF GoodRelations … Google is crawling RDFa and Microformats that use the Google vocabulary

Linked Data Browsers Not actually separate browsers. Run inside of HTML browsers View the data that is returned after looking up a URI in tabular form (IMO) UI lacks usability

Linked Data Browsers Tabulator http://www.w3.org/2005/ajar/tab OpenLink http://ode.openlinksw.com/ ZitgistDataviewr http://dataviewer.zitgist.com/ Marbles http://www5.wiwiss.fu-berlin.de/marbles/ Explorator http://www.tecweb.inf.puc-rio.br/explorator

http://dev.semsol.com/2010/semtech/

Time to create new and innovative ways to interact with Linked Data

This may be one of the Killer Apps that we have all been waiting for http://en.wikipedia.org/wiki/File:Mosaic_browser_plaque_ncsa.jpg

It’s time to partner with HCI community Semantic Web UIs don’t have to be ugly

Consume Linked Data with SPARQL

SPARQL Endpoints Linked Data sources usually provide a SPARQL endpoint for their dataset(s) SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol* Send your SPARQL query, receive the result * http://www.w3.org/TR/rdf-sparql-protocol/

Where can I find SPARQL Endpoints? Dbpedia: http://dbpedia.org/sparql Musicbrainz: http://dbtune.org/musicbrainz/sparql U.S. Census: http://www.rdfabout.com/sparql Semantic Crunchbase: http://cb.semsol.org/sparql http://esw.w3.org/topic/SparqlEndpoints

Accessing a SPARQL Endpoint SPARQL endpoints: RESTful Web services Issuing SPARQL queries to a remote SPARQL endpoint is basically an HTTP GET request to the SPARQL endpoint with parameter query GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1 URL-encoded string with the SPARQL query

Query Results Formats SPARQL endpoints usually support different result formats: XML, JSON, plain text (for ASK and SELECT queries) RDF/XML, NTriples, Turtle, N3 (for DESCRIBE and CONSTRUCT queries)

Query Results Formats PREFIX dbp: http://dbpedia.org/ontology/ PREFIX dbpprop: http://dbpedia.org/property/ SELECT ?name ?bdayWHERE { ?pdbp:birthplace <http://dbpedia.org/resource/Berlin> . ?pdbpprop:dateOfBirth ?bday . ?pdbpprop:name ?name . }

Query Result Formats Use the ACCEPT header to request the preferred result format: GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.org User-agent: my-sparql-client/0.1 Accept: application/sparql-results+json

Query Result Formats As an alternative some SPARQL endpoint implementations (e.g. Joseki) provide an additional parameter out GET /sparql?out=json&query=... HTTP/1.1 Host: dbpedia.org User-agent: my-sparql-client/0.1

Accessing a SPARQL Endpoint More convenient: use a library SPARQL JavaScript Library http://www.thefigtrees.net/lee/blog/2006/04 sparql_calendar_demo_a_sparql.html ARC for PHP http://arc.semsol.org/ RAP – RDF API for PHP http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html

Accessing a SPARQL Endpoint Jena / ARQ (Java) http://jena.sourceforge.net/ Sesame (Java) http://www.openrdf.org/ SPARQL Wrapper (Python) http://sparql-wrapper.sourceforge.net/ PySPARQL (Python) http://code.google.com/p/pysparql/

Accessing a SPARQL Endpoint Example with Jena/ARQ import com.hp.hpl.jena.query.*; String service = "..."; // address of the SPARQL endpoint String query = "SELECT ..."; // your SPARQL query QueryExecutione = QueryExecutionFactory.sparqlService(service, query) ResultSet results = e.execSelect(); while ( results.hasNext() ) { QuerySolutions = results.nextSolution(); // ... } e.close();

Querying a single dataset is quite boring compared to: Issuing SPARQL queries over multiple datasets How can you do this? Issue follow-up queries to different endpoints Querying a central collection of datasets Build store with copies of relevant datasets Use query federation system

Follow-up Queries Idea: issue follow-up queries over other datasets based on results from previous queries Substituting placeholders in query templates

String s1 = "http://cb.semsol.org/sparql"; String s2 = "http://dbpedia.org/sparql"; String qTmpl = "SELECT ?c WHERE{ <%s> rdfs:comment ?c }"; String q1 = "SELECT ?s WHERE { ..."; QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1); ResultSet results1 = e1.execSelect(); while ( results1.hasNext() ) { QuerySolution s1 = results.nextSolution(); String q2 = String.format( qTmpl, s1.getResource("s"),getURI() ); QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2); ResultSet results2 = e2.execSelect(); while ( results2.hasNext() ) { // ... } e2.close(); } e1.close(); Find a list of companies Filtered by some criteria and return DbpediaURIs from them

Follow-up Queries Advantage Queried data is up-to-date Drawbacks Requires the existence of a SPARQL endpoint for each dataset Requires program logic Very inefficient

Querying a Collection of Datasets Idea: Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasets Example: SPARQL endpoint over a majority of datasets from the LOD cloud at: http://uberblic.org http://lod.openlinksw.com/sparql

Querying a Collection of Datasets Advantage: No need for specific program logic Drawbacks: Queried data might be out of date Not all relevant datasets in the collection

Own Store of Dataset Copies Idea: Build your own store with copies of relevant datasets and query it Possible stores: Jena TDB http://jena.hpl.hp.com/wiki/TDB Sesame http://www.openrdf.org/ OpenLink Virtuoso http://virtuoso.openlinksw.com/ 4store http://4store.org/ AllegroGraphhttp://www.franz.com/agraph/ etc.

Populating Your Store Get RDF dumps provided for the datasets (Focused) Crawling ldspiderhttp://code.google.com/p/ldspider/ Multithreaded API for focussed crawling Crawling strategies (breath-first, load-balancing) Flexible configuration with callbacks and hooks

Own Store of Dataset Copies Advantages: No need for specific program logic Can include all datasets Independent of the existence, availability, and efficiency of SPARQL endpoints Drawbacks: Requires effort to set up and to operate the store Ideally, data sources provide RDF dumps; if not? How to keep the copies in sync with the originals? Queried data might be out of date

Federated Query Processing Idea: Querying a mediator which distributes sub-queries to relevant sources and integrates the results

Federated Query Processing Instance-based federation Each thing described by only one data source Untypical for the Web of Data Triple-based federation No restrictions Requires more distributed joins Statistics about datasets required (both cases)

Federated Query Processing DARQ (Distributed ARQ) http://darq.sourceforge.net/ Query engine for federated SPARQL queries Extension of ARQ (query engine for Jena) Last update: June 28, 2006 Semantic Web Integrator and Query Engine(SemWIQ) http://semwiq.sourceforge.net/ Actively maintained

Federated Query Processing Advantages: No need for specific program logic Queried data is up to date Drawbacks: Requires the existence of a SPARQL endpoint for each dataset Requires effort to set up and configure the mediator

In any case: You have to know the relevant data sources When developing the app using follow-up queries When selecting an existing SPARQL endpoint over a collection of dataset copies When setting up your own store with a collection of dataset copies When configuring your query federation system You restrict yourself to the selected sources

In any case: You have to know the relevant data sources When developing the app using follow-up queries When selecting an existing SPARQL endpoint over a collection of dataset copies When setting up your own store with a collection of dataset copies When configuring your query federation system You restrict yourself to the selected sources There is an alternative: Remember, URIs link to data

Automated Link Traversal Idea: Discover further data by looking up relevant URIs in your application Can be combined with the previous approaches

Link Traversal Based Query Execution Applies the idea of automated link traversal to the execution of SPARQL queries Idea: Intertwine query evaluation with traversal of RDF links Discover data that might contribute to query results during query execution Alternately: Evaluate parts of the query Look up URIs in intermediate solutions

Link Traversal Based Query Execution

Link Traversal Based Query Execution Advantages: No need to know all data sources in advance No need for specific programming logic Queried data is up to date Does not depend on the existence of SPARQL endpoints provided by the data sources Drawbacks: Not as fast as a centralized collection of copies Unsuitable for some queries Results might be incomplete (do we care?)

Implementations Semantic Web Client library (SWClLib) for Java http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/ SWIC for Prolog http://moustaki.org/swic/

Implementations SQUIN http://squin.org Provides SWClLib functionality as a Web service Accessible like a SPARQL endpoint Install package: unzip and start Less than 5 mins! Convenient access with SQUIN PHP tools: $s = 'http:// ...'; // address of the SQUIN service $q = new SparqlQuerySock( $s, '... SELECT ...' ); $res = $q->getJsonResult();// or getXmlResult()

Getting Started Finding URIs Finding Additional Data Finding SPARQL Endpoints

What is a Linked Data application Software system that makes use of data on the web from multiple datasets and that benefits from links between the datasets

Characteristics of Linked Data Applications ,[object Object]

Discover further information by following the links between different data sources: the fourth principle enables this.

Combine the consumed linked data with data from sources (not necessarily Linked Data)

Expose the combined data back to the web following the Linked Data principles

Consuming Linked Data SemTech2010

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Consuming Linked Data SemTech2010

Semelhante a Consuming Linked Data SemTech2010 (20)

Mais de Juan Sequeda

Mais de Juan Sequeda (20)

Último

Último (20)

Consuming Linked Data SemTech2010