24. RDF
subject - predicate - object
Ruby was designed by Matz
Ruby was designed by Matz
http://dbpedia.org/resource/Ruby_(programming_language)
25. RDF
subject - predicate - object
Ruby was designed by Matz
Ruby was designed by Matz
http://dbpedia.org/resource/Ruby_(programming_language)
http://dbpedia.org/resource/Yukihiro_Matsumoto
26. RDF
subject - predicate - object
Ruby was designed by Matz
Ruby was designed by Matz
http://dbpedia.org/resource/Ruby_(programming_language)
http://dbpedia.org/property/designer
http://dbpedia.org/resource/Yukihiro_Matsumoto
41. rapper -oturtle http://dbpedia.org/property/designer
=> returns data formatted in turtle
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://dbpedia.org/property/designer>
a rdf:Property ;
rdfs:label quot;designerquot; .
42. RDF Vocabularies
• FOAF, DOAP, Music Ontology, Programmes Ontology...
• RDF Schema (RDFS)
• Describe classes, properties, some restrictions
• Web Ontology Language (OWL)
• More control over restrictions
• owl:sameAs
43. Ruby and RDF
• Reddy - http://github.com/tommorris/reddy
• Redland bindings - http://librdf.org
• ActiveRDF - http://activerdf.org
• Like ActiveRecord for RDF data
44. require 'rubygems'
require 'active_rdf'
# add DBpedia SPARQL endpoint as a datasource
adapter = ConnectionPool.add_data_source(:type
=> :sparql, :url => quot;http://dbpedia.org/sparqlquot;, :engine
=> :virtuoso, :results => :sparql_xml)
adapter.enabled = true
# we register a short-hand notation for the namespaces
Namespace.register(:dbprop, 'http://dbpedia.org/property/')
Namespace.register(:dbowl, 'http://dbpedia.org/ontology/')
57. There’s plenty more!
• Schemas (RDFS) and Ontologies (OWL)
• SPARQL
• Query language for Linked Data
• Triple stores
• Store RDF data, no need to design schema up front
• RDFa
• Embed RDF into HTML markup
• Supported by Google and Yahoo
Notas do Editor
I want to start by looking at a data mashup.
It’s not pretty, but it’s doing some interesting things under the covers.
With this mashup, you provide a place, e.g. London, and it will list the programmes on the BBC that feature artists coming from that place.
e.g. ...
Nice for a world music show, or if you’re building a travel website and want to recommend music for people wanting to travel to a location...
So this mashup involves:
programme information from BBC Programmes
artist information from BBC Music
and from Wikipedia where each artist was born or was formed
How would you go about it in the traditional Web 2.0 API world?
BBC Programmes would provide an API, BBC Music would have it’s API - most likely designed from scratch and with it’s own proprietary XML schema or format.
Hopefully someone would have made a nice Ruby gem that abstracts the API so you can call it from a script.
With Wikipedia again you’ll need an API and some code around it.
but the artist born/form data isn’t readily available. But what happens with Wikipedia? You’d probably have to screen scrape each artist page and work out where the artist was born, then fetch all that data and store it in a database and query that...
And thinking of all of those Web 2.0 sites, each implementing their own API - that’s a lot of Ruby libraries for wrapping around...
what if there was a standard way of publishing and querying data on the web?
It turns out there is!
And it’s been the plan all along.
This diagram is from Tim Berners-Lee original proposal for the web.
At the time he was thinking about typed links:
Tim Berners Lee - wrote - this docment - describes - a proposal (“Mesh”)
“This document was an attempt to persuade CERN management that a global hypertext system was in CERN's interests. Note that the only name I had for it at this time was \"Mesh\" -- I decided on \"World Wide Web\" when writing the code in 1990.”
And in this diagram, which is based on a talk by Tim Berners Lee from 1994, there’s the idea of having documents on the web (the blue blobs), and how they relate to real objects: people, houses, relationships...
http://www.w3.org/Talks/WWW94Tim/
The mechanism for exposing, sharing, and connecting data on the web is called Linked Data.
And it’s based on four rules.
The first rule of linked data is: Use URIs as names for things
Any resource you want to talk about: people, places, programming languages, proteins - it gets a URI
The second rule of linked data is: Use HTTP URIs so that people can look up those names
And this is important, because...
The third rule of Linked Data is: When someone looks up a URI, provide useful information
So whenever there’s a URI for a resource you want to know more about, you can use HTTP to fetch data about that resource.
The fourth rule of Linked Data is: Include links to other URIs, so that they can discover more things
And this is what makes linked data so powerful, as you can start using links to traverse across different datasets.
Linked Data uses RDF, the resource description framework, to model data.
So when you want to fetch some information about a resource, it is returned in RDF.
With RDF, you make statements about resources using subject-predicate-object expressions.
For example, if you want to say that “Ruby was designed by Matz”.
The subject is “Ruby”, the predicate is “was designed by”, and the “object” is “Matz”.
And in RDF we’re using URIs.
So there’s a URI for Ruby
And a URI for Matz.
And here is the URI for the property indicating something’s designer.
And using these triple expressions, you build up a graph of data...
And I really want to stress that RDF is a model - there are different serialisations.
Here are 2 triples expressed in the ntriples format: it’s the most basic format.
You simply get subject - predicate - object - new line, subject - predicate - object - new line
Here is the most established RDF format: RDF/XML - unfortunately this is the format most people are exposed to when learning about RDF.
It’s horrible to read, it’s ugly - it’s just not user friendly at all!
Luckily, there’s turtle: it let’s you write RDF graphs in a compact text format.
In turtle, you can see what those 2 triples are saying:
The resource here Ruby, has a type Language, and it’s designer is Matz.
I like to think of turtle versus RDF/XML in the same way you get YAML and XML - it’s just so much nicer to use!
I want to show you what happens when you fetch RDF
But first I want to introduce DBpedia, as the examples I’ve been showing uses DBpedia URIs.
The DBpedia project is extracting structured data from Wikipedia, and making it available as Linked Data.
It’s mostly based on extracting data from InfoBoxes on the Wikipedia articles.
And it uses the Wikipedia article title in the URI for the RDF resource.
I want to quickly introduce the concept of RDF vocabularies.
For those familiar with microformats, these are roughly equivalent to microformat vocabularies like hcard, haudio and so on.
Some of the most popular ones are:
* FOAF - friend of a friend, allows you to talk about people and their relationships, who knows who - a social graph
* DOAP - description of a project, let’s you talk about projects and who works on them
* Music Ontology - defines the domain of music: artists, composers, works, tracks and so on
* Programmes Ontology - defined at the BBC, describes programmes and series and broadcasts and so on
In RDF there are two ways to define a vocabulary.
Originally, there was RDF Schema which let’s you define classes and properties, and restrictions (e.g. a “born” property can only be applied to a “person”
The Web Ontology Language OWL expands on this, so you have more control on restrictions. e.g. can say that a project must have at least one contributor
It also defines a really valuable property: owl:sameAs, so you can say this thing in this data set here is the same thing as that data set over there - really useful for mashups.
Reddy - an RDF/XML parser based on libxml.
Redland bindings - crossplatform RDF parsing library
ActiveRDF, as the name suggests, is like ActiveRecord for RDF.
you can write stuff back
if you add a write enabled adapter, you can store data
it’s like a openstruct on the web
and it’s a perfect fit for ruby - try doing that with java - it looks horrendous!
Based on the Friends of a Friend (FOAF) vocabulary.
Describes people and their relationships...
takes the foaf profile and builds his homepage
uses linked data: e.g. it links to his friend’s foaf and fetches information about them (e.g. their homepage). Then when someone updates their homepage in their foaf file it can be picked up automatically here.
look at interests: linked to dbpedia concepts, so could fetch more data about the interest on the page
One of the interesting things happening in the Linked Data world is the Linking Open Data initiative.
A number of data set owners have been publishing their information as Linked Data, including:
* DBPedia
* BBC
* Freebase
* Reuters with Open Calais
* Lots of the academic scientific community: eprints, pubmed
Many linked data geeks have been writing wrappers around normal APIs:
* last.fm
* myspace
* musicbrainz
And these datasets, and how they link together, have been documented on what is called the Linked Data cloud. And it’s been really exciting to see this cloud grow over the last couple of years.
I’d like to see more Ruby and Rails applications on the Linked Data cloud!
So here’s a quick guide on the simplest possible technique to publish RDF from Rails.
It’s not quite automagical, but it gets the job done - and it’s the same technique we’re using on BBC Music and BBC Programmes.
I’m taking this example from the UK Companies app I was involved in for Rewired state - I’m going to take you through the git change set when I added an RDF view to this application.
The app displays information about companies registered in the UK.
The first step is to embrace REST - one subject per resource.
But as REST is already second nature to most Rails developers, I won’t go into this much more.
Now to dig in to some Rails code - first of all we add the RDF mime type.
In the controller action we want to add RDF to, we need to add a format.rdf.
We then need an application.rdf.erb.
As RDF/XML is the most widely supported RDF serialisation, I based the RDF view on it.
It’s pretty straight forward, it just sets up the XML namespaces for the vocabularies I want to use.
Here’s the actual RDF erb for the show action for a company.
It’s pretty simple to implement.
The tricky bit is working out the classes/properties to describe your data.
But to build this view, I simply started off with some other RDF example. I think I took the RDF/XML off BBC programmes and then hacking around with it. I went to linked data search engines like DBPedia and searched for some companies to see what kind of vocabulary was used to describe them.
And that was enough to get going - I put the RDF out there, and once I tweeted about it within a few hours I’d received some feedback about certain aspects of the RDF. Some of those comments I took onboard, and then fed this back into the RDF view.
It reminds me of when I was learning HTML - you simply look around for stuff you like, view source, copy/paste and then hack around until it looks right.