Publishing "5 star" data: the case for RDF

Peter Winstanley: Holyrood Magazine Open Data Scotland: 10 December 2013

Application Integration
Total Effort

Semantic
Issues

http://www.opengroup.org/subjectareas/si

Resource Description Framework
RDF.
• Initially a way of adding metadata to XML
• Subject-Predicate-Object or
• Subject-Predicate-Literal triples
Scotland has an Authority that is
Aberdeen City
Population
Scotland

Authority

Aberdeen City
“218,220”
Aberdeen City has a Population with value
“218,220”

E

T

L

Extraction, Transformation and Loading

“One often overlooked advantage that
RDF offers is its deceptively simple data
model. This data model trivializes
merging of data from multiple sources
and does it in such a way that data about
the same things gets collated and
deduplicated. In my opinion this is the
most important benefit of using RDF
over other open data formats.” (Ian
Davis, 2011)
http://blog.iandavis.com/2011/08/18/the-real-challenge-for-rdf-is-yet-to-come/

A resource
… with the name “Bonnet”
…. living in Paris
owns …
Pet 2
… that is called Sasha
“Bonnet”

http://ex.org/pet/2

http://place.org/Paris

“Sasha”

Pet 2
… is a ferret
… and has chicken as the favourite food

“chicken”

“ferret”

http://ex.org/pet/2

“chicken”
The two references to
http://ex.org/pet/2 point to
the same resource so the
graphs can merge.

“ferret”

“Bonnet”
http://ex.org/pet/2

http://ex.org/pet/2

http://place.org/Paris

“Sasha”

"Schema up
front" design

Fact Table

Data Cube

Change is costly

EDW
enterprise data warehouse

In contrast...

RDF triplestores:
• promiscuous
• schema-independent

What do
the joins
mean?
http://www.torkiljohnsen.com/wp-content/uploads/2010/07/joomla_1.6_database_schema.png

In contrast...

RDF has
explicit
semantics.

http://ilrt.org/discovery/2001/01/rdf-thes/

RDF - Gross Morphology of Network

So RDF data is “5 star” because
No need for prior design discussion with data
suppliers about data specification.
No need to design container before accepting data.
Datasets are self-describing. Explicit semantics.
Data deduplicates and collates automatically

Merged datasets are collated and de-duplicated
automatically.

The Quick Tour
1.
2.
3.
4.
5.

Ed/training
Creation
Storage
Publishing
Use

Education/training at scale
• The Euclid Project - http://www.euclidproject.eu/
•
•
•
•
•
•

Module 1: Introduction and Application Scenarios
Module 2: Querying Linked Data
Module 3: Providing Linked Data
Module 4: Interaction with Linked Data
Module 5: Creating Linked Data Applications
Module 6: Scaling up

Creating RDF.
Hand written, or scripted
• http://aksw.org/Projects/Xturtle.html
• http://jena.apache.org/
• http://www.openrdf.org/
• http://www.rdflib.net/
• http://rdf.rubyforge.org/
• http://librdf.org/

Creating RDF..
GUI
http://openrefine.org/

Plugins to output RDF

'Reconciliation' services available

Creating RDF...
Wikis

• Use Wikipedia and let DBPedia work for you

• Semantic Mediawiki - outputs RDF and can be
linked to triplestore directly
• Drupal and DBPedia - creates RDFa which can
be scraped, - not very widely used.

Creating RDF....
Relational to RDF mapping
• D2R Server: Accessing databases with SPARQL
and as Linked Data
– http://opendata.tellmescotland.gov.uk

• Virtuoso RDF Views
– http://location.testproject.eu/BEL/

Native RDF Triplestores.
• Apache Jena "TDB"
• Used in ....

Native RDF Triplestores..
•
•
•
•

4Store
Sesame
Mulgara
Bigdata

All provide SPARQL over HTTP, and native APIs

Geospatial Triplestores
•
•
•
•
•

Virtuoso Universal Server (7.0, ColumnStore edition)
Parliament (2.7.4 quickstart)
uSeekM (1.2.0-a5, on top of PostgreSQL 8.4 and PostGIS 1.5)
OWLIM-SE (Trial version 5.3.5849)
Strabon (3.2.3, on top of PostgreSQL 8.4 and PostGIS 1.5)

Xen VMs for each available in Debian 6
http://blog.geoknow.eu/virtual-machines-of-geospatial-rdf-stores/
Dr. Jens Lehmann. Uni Leipzig

Linked Data API.

PublishMyData Linked Data API

Linked Data API..

ELDA Linked Data API

Linked Data API...
Entity Resolution:
Victoria Quay is
http://cofog01.data.scotland.gov.uk/id/facility/AB0103

...resolves
http://cofog01.data.scotland.gov.uk/doc/facility/AB0103

Linked Data API....
Different serialisations [JSON, NT, RDF/XML etc]
HTTP "Accept" headers - e.g. "application/json"

303 re-directs
http://cofog01.data.scotland.gov.uk/id/facility/AB0103.nt
http://cofog01.data.scotland.gov.uk/doc/facility/AB0103.rdf

Linked Data API.....

• SPARQL is for experts
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sepaw: <http://data.sepa.org.uk/def/water/>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX sepaloc: <http://data.sepa.org.uk/def/location/>
PREFIX sepaw: <http://data.sepa.org.uk/def/water/>
CONSTRUCT {?item sepaw:waterBodyId ?___0 .
?item sepaw:wiseCode ?___1 .
?item sepaw:inRiverBasinDistrict ?___2 .
?___2 rdfs:label ?___3 .
?item geo:lat ?___4 .
?item sepaw:category ?___5 .
?item sepaw:inSubBasinDistrict ?___6 .
?___6 rdfs:label ?___7 .
?item rdfs:label ?___8 .
?item sepaw:lengthKm ?___9 .
?item sepaw:currentOverallClassification ?___10 .
?item sepaloc:unitaryAuthority ?___11 .
?item geo:long ?___12 .
?item sepaw:inCatchment ?___13 .
?___13 rdfs:label ?___14 .
?item sepaw:currentClassificationYear ?___15 .
?item sepaloc:postcodeDistrict ?___16 .
?item sepaw:areaSqKm ?___17 .
}

WHERE {
{SELECT ?item
WHERE {
?item rdf:type sepaw:SurfaceWaterBody .
} OFFSET 0 LIMIT 10
}{ ?item sepaw:waterBodyId ?___0 . }
UNION { ?item sepaw:wiseCode ?___1 . }
UNION {{ ?item sepaw:inRiverBasinDistrict ?___2 . } OPTIONAL {
{ ?___2 rdfs:label ?___3 . }
}}
UNION { ?item geo:lat ?___4 . }
UNION { ?item sepaw:category ?___5 . }
UNION {{ ?item sepaw:inSubBasinDistrict ?___6 . } OPTIONAL {
{ ?___6 rdfs:label ?___7 . }
}}
UNION { ?item rdfs:label ?___8 . }
UNION { ?item sepaw:lengthKm ?___9 . }
UNION { ?item sepaw:currentOverallClassification ?___10 . }
UNION { ?item sepaloc:unitaryAuthority ?___11 . }
UNION { ?item geo:long ?___12 . }
UNION {{ ?item sepaw:inCatchment ?___13 . } OPTIONAL {
{ ?___13 rdfs:label ?___14 . }
}}
UNION { ?item sepaw:currentClassificationYear ?___15 . }
UNION { ?item sepaloc:postcodeDistrict ?___16 . }
UNION { ?item sepaw:areaSqKm ?___17 . }
}

Linked Data API....
• Linked Data API makes it easy
http://data.sepa.org.uk/doc/water/surfacewaters

http://data.sepa.org.uk/doc/water/surfacewaters.xml

FluidOps Workbench & FedX
• Built on top of Sesame RDF store
• Wiki-like structure for interaction
• Data pipelined in from external SPARQL and
other sources
• Includes widgets, graph views, facet views etc
for interacting with the aggregated data

What RDF data is "out there" already?

DBPedia - at the heart of Open Data
September 2013
http://en.wikipedia.org/wiki/DBpedia
45 million interlinks with
Freebase
OpenCyc
UMBEL
GeoNames,
Musicbrainz,
CIA World Fact Book
DBLP
Project Gutenberg
DBtune Jamendo
Eurostat
Uniprot
Bio2RDF
US Census data

Also used in
Thomson Reuters OpenCalais
New York Times Linked Open Data
Zemanta API
DBpedia Spotlight
BBC datasets

Quick Test
• http://localhost:3030/sparql-editor.tpl
SELECT ?country ?country_name ?capital ?pop ?p ?x ?q ?w
WHERE { SERVICE <http://dbpedia.org/sparql/sparql>
{ ?country a type:LandlockedCountries ;
rdfs:label ?country_name ;
prop:populationEstimate ?pop;
prop:capital ?capital .
FILTER ( lang(?country_name) = 'en' )
}
SERVICE <http://worldbank.270a.info/sparql>
{optional {
?p ?x ?country.
?p ?q ?w . }}
}
limit 10

Publishing "5 star" data: the case for RDF

Publishing "5 star" data: the case for RDF

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Publishing "5 star" data: the case for RDF

Semelhante a Publishing "5 star" data: the case for RDF (20)

Mais de PeterWinstanley1

Mais de PeterWinstanley1 (11)

Último

Último (20)

Publishing "5 star" data: the case for RDF

Notas do Editor