15. Web 1.0 – Linking Documents Web 2.0 – Linking People Web 3.0 – Linking Data
16. Web 3.0 – Linking Data Title Publisher Format Author Price Cover “I see: things (and relationships). This information is about a book.” --my Computer
19. What’s a Triple? Two uniquely identified THINGS Connected by a uniquely identified RELATIONSHIP So, what’s a THING?
20. A THING is anything that can be uniquely identified by a URI or a literal (string) Me My postal code The White House L.A. County’s sales tax rate http://twitter.com/ericaxel http://www.city-data.com/zips/90043.html Lat: 38.89859 Long: -77.035971 9.750 % http://ericfranzon.com/operator.jpg
22. Triples Has Title Book “Title” Created Eric Webpage Objects Subjects Has License CC Non-Commercial Image Predicates
23. Power of Relationships The real power of triples comes from uniquely identifying RELATIONSHIPS Who’s your daddy?
24. Is Father of <owl:ObjectPropertyrdf:ID="isFather"> <rdfs:domainrdf:resource="#Person"/> <rdfs:rangerdf:resource="#Person"/> </owl:ObjectProperty>
25. Is Father of mailto:ericaxel@yahoo.com <owl:ObjectPropertyrdf:ID="isFather"> <rdfs:domainrdf:resource="#Person"/> <rdfs:rangerdf:resource="#Person"/> </owl:ObjectProperty>
26. Is Father of <owl:ObjectPropertyrdf:ID="isFather"> <rdfs:domainrdf:resource="#Person"/> <rdfs:rangerdf:resource="#Person"/> </owl:ObjectProperty>
27. Author Title Wrote Has Title Written by Book Has ISBN Has Publisher ISBN Publisher
49. SPARQL Example #1 FOAF (some people that Eric Franzon knows) PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name FROM <http://ericaxel.com/eric.rdf> WHERE { ?knower foaf:knows ?known . ?known foaf:name ?name . }
Hello, welcome. I want this to be participatory, there are a lot of great brains in the room, and this is a place to get answers! No stupid questions here. I may suggest that we postpone some of them if we start getting too deep into an area that I think can be better covered elsewhere, or if I need to find a resource to point you to, but please ask your questions!That’s your job – to take what you can from tonight and enjoy yourselves.My job is to not be boring.So let’s get started…
Fortunately, I have some colleagues joining me who will help in that regard.This morning, I’m going to begin by giving you a ramp up into the world of Linked Open Data and the technologies used in both LOD and LED initiatives. Once I have given you that background, Bernadette Hyland from Zepheira will take you through some of the exciting work that they are doing with Linked Data to solve very real business problems faster and less expensively than would be possible using other approaches.This afternoon, David Wood will take us further into the idea of implementing these technologies in the enterprise.So let’s get started…
I use these terms interchangeably. There is a lot of discussion about what they each mean, which is perhaps ironic, since the meaning of Semantic in this case is the same thing it means in linguistics or philosophy: that is, “MEANING.”Another term that has been gaining a lot of traction is… [BUILD]Web of Data. I like this term a lot and I hope that by the end of this session, you will understand why.
Let’s look at some ‘versions’ of the Web. It should be said here that Tim Berners-Lee, the recognized “father” of the WWW, doesn’t like the idea of versioning the Web. I happen to agree, but I understand why people do it.As we talk about these versions of the Web, you may want to think of this as a continuum with significant waves; each with its own benchmark technologies rather than specific versions with distinct start and end points.Nova Spivack of Radar Networks and Twine.com created this.
When the Web came about, people were excited because here was a way to post documents and share them quickly, easily, inexpensively, and globally. There was a way to describe documents uniquely in the form of URLs; Documents linked to other documents with hyperlinks; URLs are URIsOne could even search for relevant documents; Corporate intranets and portals became popular as enterprises realized that they could use this technology to share documents within their organizations, no matter how disparate their teams were.These documents could take the form of HTML pages, PDFs, Word publishing files, spreadsheets, images… really any type of file that could be saved in a digital format could be posted to the Web and shared. Obviously, this was a huge step forward for human information collection and sharing. But what about the machines we use?
My friend Karen wrote a book of poetry. It sells on Amazon.com.In a Web 1.0 context, this page is an HTML document with some information displayed for human consumption. A computer, through use of standards like HTML, PDF, and corresponding browser technology, recognizes certain elements as character strings, numbers, images, formatting, etc.BUT, the computer does NOT deduce the meaning from this information. It does not recognize that this is a book with an author, a publisher, and a format (paperback). It doesn’t know concepts like that a book can be a commodity with a corresponding price.It also doesn’t recognize that commodities such as books have REVIEWS (Web 2.0).The Data in the page are not understood by the computer, and there’s no way to link that data to any other relevant information.
In Web 2.0, the focus shifted so that now we became very interested in linking individuals, their likes and dislikes, their opinions, their profiles, photos, videos, their metadata – you get the idea.
In Web 2.0, we experienced an immense growth of content on the Web as users interacted with the Web in entirely new ways. They have been encouraged to enter data and metadata to many collections, both publicly and anonymously. End-users found their voices on the Web. My non-technical friends and relatives interact and add data on a regular basis to this system. Pretty exciting time to be us.However, my computer is still pretty clueless.
In Web 3.0, we’re no longer talking about linking documents. We’re no longer talking about searching for character strings and getting back results in search engines that match those strings.We’re actually talking about linking the information INSIDE those documents.Now we’re going to take a look at the growing world of Linked Open Data.
Remember our book example from earlier? By using Web 3.0 technologies, my computer now understands so much more!When we uniquely identify THINGS for the computer, it can start to recognize the data points in the page.Look at the THINGS on this page. I’ve marked up some of the nouns “Dick Tracy–style”. Every one of these THINGS can be referred to by a URI. The RELATIONSHIPS are merely implied here, but they are there:This BOOK | HAS | a TITLE, PUBLISHER, PRINT FORMAT, and an IMAGE associated with it. That IMAGE | IS A | COVER.The BOOK | has | a PRICE.Etc.
…which shows the progression of technologies and standards as we move toward a world of Linked Data.Here in the lower left corner is the era that we may refer to as “Web 0.0” or in Nova’s diagram, the PC era.Computers operated in solitude. With the advent of TCP/IP they could begin to share information across a network. But the files/documents were still locked in their physical location (we used the “sneakernet” to transfer files and docs)With HTTP, physical location of the computers became irrelevant. Files and Docs could be linked and shared.But the CONCEPTS within those documents were now what remained locked in a rigid structure.Semantic Technologies change that; enabling those concepts to be linked across systems in much the same way that files and computers were linked with other standards. The vision is the GGG; the Great Global Graph.
1. In Web 1.0, documents were given URIs. With a web browser, individuals could access those exact documents by entering the URIs.In Web 3.0, every THING, every RELATIONSHIP, is given a URI, providing similar access to the information within documents. We can now point to specific data points as unique resources. THAT is what I mean by ‘X’. ‘X’ is related to ‘Y’ Specifically in THIS WAY.In Web 1.0, documents could be linked together by embedding URIs in documents, but computers still couldn’t understand the information IN those documents.In the Semantic Web, the data becomes free. I’m using “free” here not to describe monetary value, nor to describe access control or security. Rather, to describe the flexibility of the data; the ACCESSIBILITY of the data in terms of what’s possible.
These are representations of:MeMy postal codeThe White HouseThe sales tax rate of Los Angeles County (remember that the next time you hate us for our winters)An imageNote that we’re talking about applying this to both structured and unstructured data. Also, these are pieces of data stored in a variety of places. Imagine what we could do if we could tie them together.[[Use example of “a PHOTO taken by ME of the WHITE HOUSE, but sold from my POSTAL CODE to another Californian (sales tax rate).” -- Make sense?]]
How many of you remember doing this? Don’t worry – I won’t make you relive 5th grade entirely.But, I do want to talk about it just for a moment – the key thing about the sentences on the previous slide is that they can be divided into a SUBJECT, PREDICATE, and OBJECT. And that, in a nutshell, is all a triple is:[BUILD] SUBJECT + PREDICATE + OBJECT
And this is what they look like in graph form…[[BUILD]]
1. In Web 1.0, documents were given URIs. With a web browser, individuals could access those exact documents by entering the URIs.In Web 3.0, every THING, every RELATIONSHIP, is given a URI, providing similar access to the information within documents. We can now point to specific data points as unique resources. THAT is what I mean by ‘X’. ‘X’ is related to ‘Y’ Specifically in THIS WAY.In Web 1.0, documents could be linked together by embedding URIs in documents, but computers still couldn’t understand the information IN those documents.In the Semantic Web, the data becomes free. I’m using “free” here not to describe monetary value, nor to describe access control or security. Rather, to describe the flexibility of the data; the ACCESSIBILITY of the data in terms of what’s possible.
This relationship can be given a specific URI. That means that the concept of isFather can have a distinct meaning, characteristics, and requirements – AND THAT DEFINITION CAN BE LINKED TO. Maybe I mean biological father. Maybe I mean a broader, social definition that includes step-fathers, adoptive fathers. Each one of these RELATIONSHIPS may be defined in a different vocabulary in a different way. The KEY THING HERE is that when a RELATIONSHIP has a URI, it can be called upon and re-used. I can choose which existing definition I want to use or I can create my own.[BUILD] The RELATIONSHIP can be expressed in a way that computers can understand. In this case, at least one such definition exists for the concept of “Father.”[BUILD] The data points can change; the relationship remains independent. “I” can be expressed by any number of URI’s, begging the philosophical question: “Who am I on the Web?” My Twitter Feed? My FaceBook Profile? My LinkedIn Profile? That senior High School photo that a “friend” just posted of me?[BUILD] The data points can change; the relationship remains independent.
This relationship can be given a specific URI. That means that the concept of isFather can have a distinct meaning, characteristics, and requirements – AND THAT DEFINITION CAN BE LINKED TO. Maybe I mean biological father. Maybe I mean a broader, social definition that includes step-fathers, adoptive fathers. Each one of these RELATIONSHIPS may be defined in a different vocabulary in a different way. The KEY THING HERE is that when a RELATIONSHIP has a URI, it can be called upon and re-used. I can choose which existing definition I want to use or I can create my own.[BUILD] The RELATIONSHIP can be expressed in a way that computers can understand. In this case, at least one such definition exists for the concept of “Father.”[BUILD] The data points can change; the relationship remains independent. “I” can be expressed by any number of URI’s, begging the philosophical question: “Who am I on the Web?” My Twitter Feed? My FaceBook Profile? My LinkedIn Profile? That senior High School photo that a “friend” just posted of me?[BUILD] The data points can change; the relationship remains independent.
This relationship can be given a specific URI. That means that the concept of isFather can have a distinct meaning, characteristics, and requirements – AND THAT DEFINITION CAN BE LINKED TO. Maybe I mean biological father. Maybe I mean a broader, social definition that includes step-fathers, adoptive fathers. Each one of these RELATIONSHIPS may be defined in a different vocabulary in a different way. The KEY THING HERE is that when a RELATIONSHIP has a URI, it can be called upon and re-used. I can choose which existing definition I want to use or I can create my own.[BUILD] The RELATIONSHIP can be expressed in a way that computers can understand. In this case, at least one such definition exists for the concept of “Father.”[BUILD] The data points can change; the relationship remains independent. “I” can be expressed by any number of URI’s, begging the philosophical question: “Who am I on the Web?” My Twitter Feed? My FaceBook Profile? My LinkedIn Profile? That senior High School photo that a “friend” just posted of me?[BUILD] The data points can change; the relationship remains independent.
Here’s graph of several related triples. I think it’s actually a lot easier to read and write than diagramming sentences.
Hopefully this is all pretty familiar territory for you.Relational Databases have been around a long time and work because they have:A way of representing data that is built upon standardsA formal way of representing schemasA robust query language that allows extraction of information
The way of representing data in RDBMS is tables.
Note that in this schema, data is connected at the table level. In creating a schema for RDBMS, you need to do a lot of planning for what goes into which tables.
And a basic SQL query.
Linked Data has the same components, but is built upon a web-scale architecture. THE WWW IS THE DATABASE!
RDF = Resource Description Framework. It is the language used for describing data (and metadata, and even other data languages). It is graph-based, and it’s the core of what we have been talking about today.What is it good for? “RDF is good for distributing data across the Web and pretending it’s in one place.”-Dean Allemang
Subjects and predicates may only be URIs.Objects may be only one of two data types: literals (Strings or other XML-defined data type) and resources (URIs).If you use http: URIs, others can reference them.
Here are examples of Dublin Core and FOAF vocabularies in use.
SPARQL = SPARQL Protocol and RDF Query Language (Recursive acronym). This is the language used to write queries over the information made available in RDF.IMPORTANT TO REMEMBER: “SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware.”This is more than a mash-up!– Linked Data allows for computation of data across websites.– Think in terms of data leading you to other data
Imagine being able to query the Web as you would a single, local database. This is a simple query, but SPARQL is a very robust language and allows for quite complex queries.Example #1: FOAF (some people that David Wood knows)Example #1 may be resolved via (e.g.) http://demo.openlinksw.com/sparql_demo/ by putting http://zepheira.com/team/dave/dave.rdf into the Graph field and the query into the SPARQL Query field. [[PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT ?nameFROM <http://zepheira.com/team/dave/dave.rdf>WHERE { ?knower foaf:knows ?known . ?known foaf:name ?name .}]]
There is also a standard that defines a tool for using this query language. That standard is a SPARQL endpoint. Imagine being able to query the Web as you would a single, local database. This is a simple query, but SPARQL is a very robust language and allows for quite complex queries.This is what’s called a generic SPARQL endpoint. http://demo.openlinksw.com/sparql_demo/This type of endpoint sits somewhere on the Web and goes out to retrieve RDF data from elsewhere on the Web to run a query. Because a generic SPARQL endpoint will query against arbitrary RDF data, we must specify the URL of the graph (or graphs) to run the query against. We do this either using the input boxes provided on the human-friendly forms, or using the SPARQL FROM clause. It allows me to enter the URI for the graph I want to access (in this case, Dave’s FOAF file – written in RDF), and the SPARQL query specifying the result set I am looking for.
And because it’s a web of data, we can also query on things for fun as well as for business. Now, living in L.A., I have friends for whom this IS business -- but I digress. Example #2: DBPedia (Bart Simpson's chalkboard gags)We should get back two columns of data: “Episode Title” and “What Bart wrote” [[SELECT ?episode,?chalkboard_gag WHERE { ?episode skos:subject ?season . ?season rdfs:label ?season_title . ?episode dbpedia2:blackboard ?chalkboard_gag . FILTER (regex(?season_title, "The Simpsons episodes, season")) . } ORDER BY ?season]]http://dbpedia.org/snorql/?query=SELECT+%3Fepisode%2C%3Fchalkboard_gag+WHERE+%7B%0D%0A++%3Fepisode+skos%3Asubject+%3Fseason+.%0D%0A++%3Fseason+rdfs%3Alabel+%3Fseason_title+.%0D%0A++%3Fepisode+dbpedia2%3Ablackboard+%3Fchalkboard_gag+.%0D%0A++FILTER+%28regex%28%3Fseason_title%2C+%22The+Simpsons+episodes%2C+season%22%29%29+.%0D%0A++%7D%0D%0A++ORDER+BY+%3Fseason Here, we saw a specific SPARQL endpoint for DBPedia so we didn’t need to specify the FROM. We are only querying the graph from the DBPedia dataset.
And because it’s a web of data, we can also query on things for fun as well as for business. Now, living in L.A., I have friends for whom this IS business -- but I digress. Example #2: DBPedia (Bart Simpson's chalkboard gags)We should get back two columns of data: “Episode Title” and “What Bart wrote” [[SELECT ?episode,?chalkboard_gag WHERE { ?episode skos:subject ?season . ?season rdfs:label ?season_title . ?episode dbpedia2:blackboard ?chalkboard_gag . FILTER (regex(?season_title, "The Simpsons episodes, season")) . } ORDER BY ?season]]http://dbpedia.org/snorql/?query=SELECT+%3Fepisode%2C%3Fchalkboard_gag+WHERE+%7B%0D%0A++%3Fepisode+skos%3Asubject+%3Fseason+.%0D%0A++%3Fseason+rdfs%3Alabel+%3Fseason_title+.%0D%0A++%3Fepisode+dbpedia2%3Ablackboard+%3Fchalkboard_gag+.%0D%0A++FILTER+%28regex%28%3Fseason_title%2C+%22The+Simpsons+episodes%2C+season%22%29%29+.%0D%0A++%7D%0D%0A++ORDER+BY+%3Fseason Here, we saw a specific SPARQL endpoint for DBPedia so we didn’t need to specify the FROM. We are only querying the graph from the DBPedia dataset.
The lost episode of the Simpsons?http://www.milinkito.com/swf/bart.php